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DIRECTED EVOLUTION OF NOVEL BINDING PROTEINS 
This is a continuation of Serial No. 08/993,776 filed 
December 18, 1997, now pending; which is a continuation of 
Serial No. 08/415,922, filed April 3, 1995, now U.S. Patent 
5 No. 5,837,500; which is a continuation of Serial No. 
08/009,319, filed January 26, 1993, now U.S. Patent No. 
5,403,484; which is a division of Serial No. 07/664,989, filed 
March 1, 1991, now U.S. Patent No. 5,223,409; which is a 
continuation-in-part of Serial No. 07/487,063, filed March 2, 
10 1990, now abandoned; which is a continuation-in-part of Serial 
No. 07/240,160, filed September 2, 1988, now abandoned. 
The prior application (s) set forth above are hereby 
incorporated by reference in their entirety. 
Cross-reference to Related Applications 
15 The following related and commonly- owned applications are 

also incorporated by reference: 

Robert Charles Ladner, Sonia Kosow Guterman, Rachael 
Baribault Kent, and Arthur Charles Ley are named as joint 
inventors on U.S. S.N. 07/293,980, filed January 8, 1989, now 

2 0 Patent No. 5,096,815, and entitled GENERATION AND SELECTION OF 

NOVEL DNA- BINDING PROTEINS AND POLYPEPTIDES. This application 
has been assigned to Protein Engineering Corporation. 

Robert Charles Ladner, Sonia Kosow Guterman, and Bruce 
Lindsay Roberts are named as a joint inventors on a U.S. S.N. 
25 07/470,651 filed 26 January 1990, now abandoned, entitled 
"PRODUCTION OF NOVEL SEQUENCE-SPECIFIC DNA- ALTERING ENZYMES", 
likewise assigned to Protein Engineering Corp. Ladner, 
Guterman, Kent, Ley, and Markland, Ser. No. 07/558,011, now 
Patent No. 5,198,346, is also assigned to Protein Engineering 

3 0 Corporation. 
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BACKGROUND OF THE INVENTION 

Field of the Invention 

This invention relates to development of novel binding 
proteins (including mini -proteins) by an iterative process of 
5 mutagenesis, expression, chromatographic selection, and 

amplification. In this process, a gene encoding a potential 
binding domain, said gene being obtained by random mutagenesis 
of a limited number of predetermined codons, is fused to a 
genetic element which causes the resulting chimeric expression 

10 product to be displayed on the outer surface of a virus 

(especially a filamentous phage) or a cell. Chromatographic 
selection is then used to identify viruses or cells whose 
genome includes such a fused gene which coded for the protein 
which bound to the chromatographic target. 

15 Information Disclosure Statement 
A. Protein Structure 

The amino acid sequence of a protein determines its 
three-dimensional (3D) structure, which in turn determines 
protein function (EPST63, ANFI73). Shortle (SHOR85) , Sauer 

20 and colleagues (PAKU86, REID88a) , and Caruthers and colleagues 
(EISE85) have shown that some residues on the polypeptide 
chain are more important than others in determining the 3D 
structure of a protein. The 3D structure is essentially 
unaffected by the identity of the amino acids at some loci; at 

25 other loci only one or a few types of amino acid is allowed. 
In most cases, loci where wide variety is allowed have the 
amino acid side group directed toward the solvent. Loci where 
limited variety is allowed frequently have the side group 
directed toward other parts of the protein. Thus 

3 0 substitutions of amino acids that are exposed to solvent are 
less likely to affect the 3D structure than are substitutions 
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at internal loci. (See also SCHU79, pl69-171 and CREI84, 

p239-245, 314-315) . 

The secondary structure (helices, sheets, turns, loops) 

of a protein is determined mostly by local sequence. Certain 
5 amino acids have a propensity to appear in certain "secondary 

structures, " they will be found from time to time in other 

structures, and studies of pentapeptide sequences found in 

different proteins have shown that their conformation varies 

considerably from one occurrence to the next (KABS84, ARG087) . 
10 As a result, a priori design of proteins to have a particular 

3D structure is difficult. 

Several researchers have designed and synthesized 

proteins de novo (MOSE83, MOSE87, ERIC86) . These designed 

proteins are small and most have been synthesized in vitro as 
15 polypeptides rather than genetically. Hecht et al . (HECH90 ) 

have produced a designed protein genetically. Moser, et al . 

state that design of biologically active proteins is currently 

impossible . 

B. Protein Binding Activity 

2 0 Many proteins bind non-covalently but very tightly and 

specifically to some other characteristic molecules (SCHU79, 
CREI84) . In each case the binding results from 
complementarity of the surfaces that come into contact : bumps 
fit into holes, unlike charges come together, dipoles align, 
25 and hydrophobic atoms contact other hydrophobic atoms. 

Although bulk water is excluded, individual water molecules 
are frequently found filling space in intermolecular 
interfaces; these waters usually form hydrogen bonds to one or 
more atoms of the protein or to other bound water. Thus 

3 0 proteins found in nature have not attained, nor do they 

require, perfect complementarity to bind tightly and 
specifically to their substrates. Only in rare cases is there 
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essentially perfect complementarity; then the binding is 
extremely tight (as for example, avidin binding to biotin) . 
C. Protein Engineering 

"Protein engineering" is the art of manipulating the 
5 sequence of a protein in order to alter its binding 

characteristics. The factors affecting protein binding are 
known, (CHOT75, CHOT76, SCHU7 9 p98-107, and CREI84, Ch8) , but 
designing new complementary surfaces has proved difficult. 
Although some rules have been developed for substituting side 

10 groups (SUTC87b) , the side groups of proteins are floppy and 
it is difficult to predict what conformation a new side group 
will take. Further, the forces that bind proteins to other 
molecules are all relatively weak and it is difficult to 
predict the effects of these forces. 

15 Recently, Quiocho and collaborators (QU1087) elucidated 

the structures of several periplasmic binding proteins from 
Gram-negative bacteria. They found that the proteins, despite 
having low sequence homology and differences in structural 
detail, have certain important structural similarities. Based 

2 0 on their investigations of these binding proteins, Quiocho et 

al . suggest it is unlikely that, using current protein 
engineering methods, proteins can be constructed with binding 
properties superior to those of proteins that occur naturally. 
Nonetheless, there have been some isolated successes. 
25 Wilkinson et al . (WILK84) reported that a mutant of the 

tyrosyl tRNA synthetase of Bacillus stearothermophilus with 
the mutation Thr 5 i-->Pro exhibits a 100-fold increase in 
affinity for ATP. Tan and Kaiser (TANK77) and Tschesche et 
al . (TSCH8 7) showed that changing a single amino acid in mini- 

3 0 protein greatly reduces its binding to trypsin, but that some 

of the mutants retained the parental characteristic of binding 
to an inhibiting chymotrypsin, while others exhibited new 
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binding to elastase. Caruthers and others (EISE85) have shown 
that changes of single amino acids on the surface of the 
lambda Cro repressor greatly reduce its affinity for the 
natural operator 0 R 3 , but greatly increase the binding of the 
5 mutant protein to a mutant operator. Changing three residues 
in subtilisin from Bacillus amyloliquef aciens to be the same 
as the corresponding residues in subtilisin from B . 
lichenif ormis produced a protease having nearly the same 
activity as the latter subtilisin, even though 82 amino acid 

10 sequence differences remained (WELL87a) . Insertion of DNA 

encoding 18 amino acids (corresponding to Pro-Glu-Dynorphin- 
Gly) into the coli phoA gene so that the additional amino 
acids appeared within a loop of the alkaline phosphatase 
protein resulted in a chimeric protein having both phoA and 

15 dynorphin activity (FREI90) . Thus, changing the surface of a 
binding protein may alter its specificity without abolishing 
binding activity. 
D. Techniques Of Mutagenesis 

Early techniques of mutating proteins involved 

20 manipulations at the amino acid sequence level. In the 

semisynthetic method (TSCH87) , the protein was cleaved into 
two fragments, a residue removed from the new end of one 
fragment, the substitute residue added on in its place, and 
the modified fragment joined with the other, original 

25 fragment. Alternatively, the mutant protein could be 
synthesized in its entirety (TANK77) . 

Erickson et al . suggested that mixed amino acid reagents 
could be used to produce a family of sequence-related proteins 
which could then be screened by affinity chromatography 

3 0 (ERIC8 6) . They envision successive rounds of mixed synthesis 
of variant proteins and purification by specific binding. 
They do not discuss how residues should be chosen for 
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variation. Because proteins cannot be amplified, the 
researchers must sequence the recovered protein to learn which 
substitutions improve binding. The researchers must limit the 
level of diversity so that each variety of protein will be 
5 present in sufficient quantity for the isolated fraction to be 
sequenced. 

With the development of recombinant DNA techniques, it 
became possible to obtain a mutant protein by mutating the 
gene encoding the native protein and then expressing the 

10 mutated gene. Several mutagenesis strategies are known. One, 
"protein surgery" (DILL87) , involves the introduction of one 
or more predetermined mutations within the gene of choice. A 
single polypeptide of completely predetermined sequence is 
expressed, and its binding characteristics are evaluated. 

15 At the other extreme is random mutagenesis by means of 

relatively nonspecific mutagens such as radiation and various 
chemical agents. See Ho et al . (HOCJ85) and Lehtovaara, E.P. 
Appln. 285,123. 

It is possible to randomly vary predetermined nucleotides 

2 0 using a mixture of bases in the appropriate cycles of a 

nucleic acid synthesis procedure. The proportion of bases in 
the mixture, for each position of a codon, will determine the 
frequency at which each amino acid will occur in the 
polypeptides expressed from the degenerate DNA population. 
25 Oliphant et al . (OLIP86) and Oliphant and Struhl (OLIP87) have 
demonstrated ligation and cloning of highly degenerate 
oligonucleotides, which were used in the mutation of 
promoters. They suggested that similar methods could be used 
in the variation of protein coding regions. They do not say 

3 0 how one should: a) choose protein residues to vary, or b) 

select or screen mutants with desirable properties. Reidhaar- 
Olson and Sauer (REID88a) have used synthetic degenerate 
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oligo-nts to vary simultaneously two or three residues through 
all twenty amino acids. See also Vershon et al . (VERS86a; 
VERS86b) . Reidhaar-Olson and Sauer do not discuss the limits 
on how many residues could be varied at once nor do they 
5 mention the problem of unequal abundance of DNA encoding 

different amino acids. They looked for proteins that either 
had wild- type dimerization or that did not dimerize. They did 
not seek proteins having novel binding properties and did not 
find any. This approach is likewise limited by the number of 

10 colonies that can be examined (ROBE8 6) . 

To the extent that this prior work assumes that it is 
desirable to adjust the level of mutation so that there is one 
mutation per protein, it should be noted that many desirable 
protein alterations require multiple amino acid substitutions 

15 and thus are not accessible through single base changes or 

even through all possible amino acid substitutions at any one 
residue . 

D. Affinity Chromatography of Cells 

Ferenci and coloborators have published a series of 

2 0 papers on the chromatographic isolation of mutants of the 

maltose -transport protein LamB of E^ coli (FERE82a, FERE82b, 
FERE83, FERE 8 4 , CLUN84 , HEIN87 and papers cited therein) . The 
mutants were either spontaneous or induced with nonspecific 
chemical mutagens. Levels of mutagenesis were picked to 
25 provide single point mutations or single insertions of two 
residues. No multiple mutations were sought or found. 

While variation was seen in the degree of affinity for 
the conventional LamB substrates maltose and starch, there was 
no selection for affinity to a target molecule not bound at 

3 0 all by native LamB, and no multiple mutations were sought or 

found. FERE84 speculated that the affinity chromatographic 
selection technique could be adapted to development of similar 
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mutants of other "important bacterial surf ace- located 
enzymes" , and to selecting for mutations which result in the 
relocation of an intracellular bacterial protein to the cell 
surface. Ferenci ■ s mutant surface proteins would not, 
5 however, have been chimeras of a bacterial surface protein and 
an exogenous or heterologous binding domain. 

Ferenci also taught that there was no need to clone the 
structural gene, or to know the protein structure, active 
site, or sequence. The method of the present invention, 

10 however, specifically utilizes a cloned structural gene. It 
is not possible to construct and express a chimeric, outer 
surface-directed potential binding protein-encoding gene 
without cloning. 

Ferenci did not limit the mutations to particular loci or 

15 particular substitutions. In the present invention, knowledge 
of the protein structure, active site and/or sequence is used 
as appropriate to predict which residues are most likely to 
affect binding activity without unduly destabilizing the 
protein, and the mutagenesis is focused upon those sites. 

2 0 Ferenci does not suggest that surface residues should be 

preferentially varied. In consequence, Ferenci 1 s selection 

system is much less efficient than that disclosed herein. 

E. Bacterial and Viral Expression of Chimeric Surface 

Proteins 

2 5 A number of researchers have directed unmutated foreign 

antigenic epitopes to the surface of bacteria or phage, fused 
to a native bacterial or phage surface protein, and 
demonstrated that the epitopes were recognized by antibodies. 
Thus, Charbit, et al . (CHAR86) genetically inserted the C3 

3 0 epitope of the VP1 coat protein of poliovirus into the LamB 

outer membrane protein of E. coli , and determined 
immunologically that the C3 epitope was exposed on the 
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bacterial cell surface. Charbit, et al . (CHAR87) likewise 
produced chimeras of LamB and the A (or B) epitopes of the 
preS2 region of hepatitis B virus. 

A chimeric LacZ/OmpB protein has been expressed in 
5 coli and is, depending on the fusion, directed to either the 
outer membrane or the periplasm (SILH77) . A chimeric 
LacZ/OmpA surface protein has also been expressed and 
displayed on the surface of coli cells (Weinstock et al . , 
WEIN83) . Others have expressed and displayed on the surface 

10 of a cell chimeras of other bacterial surface proteins, such 
as coli type 1 fimbriae (Hedegaard and Klemm (HEDE89) ) and 
Bactericides nodusus type 1 fimbriae (Jennings et al . , 
JENN89) . In none of the recited cases was the inserted 
genetic material mutagenized. 

15 Dulbecco (DULB86) suggests a procedure for incorporating 

a foreign antigenic epitope into a viral surface protein so 
that the expressed chimeric protein is displayed on the 
surface of the virus in a manner such that the foreign epitope 
is accessible to antibody. In 1985 Smith (SMIT85) reported 

2 0 inserting a nonfunctional segment of the EcoRI endonuclease 

gene into gene III of bacteriophage fl, "in phase". The gene 
III protein is a minor coat protein necessary for infectivity. 
Smith demonstrated that the recombinant phage were adsorbed by 
immobilized antibody raised against the Eco RI endonuclease, 
25 and could be eluted with acid. De la Cruz et al . (DELA88) 
have expressed a fragment of the repeat region of the 
circumsporozoite protein from Plasmodium falciparum on the 
surface of M13 as an insert in the gene III protein. They 
showed that the recombinant phage were both antigenic and 

3 0 immunogenic in rabbits, and that such recombinant phage could 

be used for B epitope mapping. The researchers suggest that 
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similar recombinant phage could be used for T epitope mapping 
and for vaccine development. 

None of these researchers suggested mutagenesis of the 
inserted material, nor is the inserted material a complete 
5 binding domain conferring on the chimeric protein the ability 
to bind specifically to a receptor other than the antigen 
combining site of an antibody. 

McCafferty et al . (MCCA90) expressed a fusion of an Fv 
fragment of an antibody to the N-terminal of the pill protein. 
10 The Fv fragment was not mutated. 

F. Epitope Libraries on Fusion Phage 

Parmley and Smith (PARM88) suggested that an epitope 
library that exhibits all possible hexapeptides could be 
constructed and used to isolate epitopes that bind to 
15 antibodies. In discussing the epitope library, the authors 
did not suggest that it was desirable to balance the 
representation of different amino acids. Nor did they teach 
that the insert should encode a complete domain of the 
exogenous protein. Epitopes are considered to be unstructured 

2 0 peptides as opposed to structured proteins. 

After the filing of the parent application whose benefit 
is claimed herein under 3 5 U.S.C. 12 0, certain groups reported 
the construction of "epitope libraries." Scott and Smith 
(SCOT90) and Cwirla et al . (CWIR90) prepared "epitope 
25 libraries" in which potential hexapeptide epitopes for a 

target antibody were randomly mutated by fusing degenerate 
oligonucleotides, encoding the epitopes, with gene III of fd 
phage, and expressing the fused gene in phage -infected cells. 
The cells manufactured fusion phage which displayed the 

3 0 epitopes on their surface; the phage which bound to 

immobilized antibody were eluted with acid and studied. In 
both cases, the fused gene featured a segment encoding a 
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spacer region to separate the variable region from the wild 
type pi I I sequence so that the varied amino acids would not be 
constrained by the nearby pi I I sequence. Devlin et al . 
(DEVL90) similarly screened, using M13 phage, for random 15 
5 residue epitopes recognized by streptavidin . Again, a spacer 
was used to move the random peptides away from the rest of the 
chimeric phage protein. These references therefore taught 
away from constraining the conformational repertoire of the 
mutated residues . 

10 Another problem with the Scott and Smith, Cwirla et al . , 

and Devlin et al . , libraries was that they provided a highly 
biased sampling of the possible amino acids at each position. 
Their primary concern in designing the degenerate 
oligonucleotide encoding their variable region was to ensure 

15 that all twenty amino acids were encodible at each position; a 
secondary consideration was minimizing the frequency of 
occurrence of stop signals. Consequently, Scott and Smith and 
Cwirla et al . employed NNK (N=equal mixture of G, A, T, C; 
K=equal mixture of G and T) while Devlin et al . used NNS 

2 0 (S=equal mixture of G and C) . There was no attempt to 

minimize the frequency ratio of most favored- to- least favored 
amino acid, or to equalize the rate of occurrence of acidic 
and basic amino acids. 

Devlin et al . characterized several affinity-selected 

25 streptavidin-binding peptides, but did not measure the 

affinity constants for these peptides. Cwirla et al . did 
determine the affinity constant for his peptides, but were 
disappointed to find that his best hexapeptides had affinities 
(350-3 00nM) , "orders of magnitude" weaker than that of the 

30 native Met -enkephalin epitope (7nM) recognized by the target 
antibody. Cwirla et al . speculated that phage bearing 
peptides with higher affinities remained bound under acidic 
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elution, possibly because of multivalent interactions between 
phage (carrying about 4 copies of pi II) and the divalent 
target IgG. Scott and Smith were able to find peptides whose 
affinity for the target antibody (A2) was comparable to that 
5 of the reference myohemerythrin epitope (50nM) . However, 
Scott and Smith likewise expressed concern that some high- 
affinity peptides were lost, possibly through irreversible 
binding of fusion phage to target. 

G. Non-Commonly Owned Patents and Applications Naming Robert 

10 Ladner as an Inventor 

Ladner, US Patent No. 4,704,692, "Computer Based System 
and Method for Determining and Displaying Possible Chemical 
Structures for Converting Double- or Multiple-Chain 
Polypeptides to Single-Chain Polypeptides" describes a design 

15 method for converting proteins composed of two or more chains 
into proteins of fewer polypeptide chains, but with 
essentially the same 3D structure. There is no mention of 
variegated DNA and no genetic selection. Ladner and Bird, 
WO88/01649 (Publ. March 10, 1988) disclose the specific 

20 application of computerized design of linker peptides to the 
preparation of single chain antibodies. 

Ladner, Glick, and Bird, WO88/06630 (publ. 7 Sept. 1988 
and having priority from US application 07/021,046, assigned 
to Genex Corp.) (LGB) speculate that diverse single chain 

25 antibody domains (SCAD) may be screened for binding to a 

particular antigen by varying the DNA encoding the combining 
determining regions of a single chain antibody, subcloning the 
SCAD gene into the gpV gene of phage lambda so that a SCAD/gpV 
chimera is displayed on the outer surface of phage lambda, and 

3 0 selecting phage which bind to the antigen through affinity 

chromatography. The only antigen mentioned is bovine growth 
hormone. No other binding molecules, targets, carrier 



organisms, or outer surface proteins are discussed. Nor is 
there any mention of the method or degree of mutagenesis. 
Furthermore, there is no teaching as to the exact structure of 
the fusion nor of how to identify a successful fusion or how 
5 to proceed if the SCAD is not displayed. 

Ladner and Bird, WO88/06601 (publ . 7 September 1988) 
suggest that single chain "pseudodimeric" repressors (DNA- 
binding proteins) may be prepared by mutating a putative 
linker peptide followed by in vivo selection that mutation and 
10 selection may be used to create a dictionary of recognition 

elements for use in the design of asymmetric repressors. The 
repressors are not displayed on the outer surface of an 
organism. 

Methods of identifying residues in protein which can be 
15 replaced with a cysteine in order to promote the formation of 
a protein-stabilizing disulfide bond are given in Pantoliano 
and Ladner, U.S. Patent No. 4,903,773 (PANT90) , Pantoliano 
and Ladner (PANT87) , Pabo and Suchenek (PABO86) , MATS 8 9 , and 
SAUE86 . 

20 

No admission is made that any cited reference is prior 
art or pertinent prior art, and the dates given are those 
appearing on the reference and may not be identical to the 
actual publication date. All references cited in this 
25 specification are hereby incorporated by reference. 



14 

SUMMARY OF THE INVENTION 

The present invention is intended to overcome the 
deficiencies discussed above. It relates to the construction, 
expression, and selection of mutated genes that specify novel 
5 proteins with desirable binding properties, as well as these 
proteins themselves. The substances bound by these proteins, 
hereinafter referred to as "targets", may be, but need not be, 
proteins. Targets may include other biological or synthetic 
macromolecules as well as other organic and inorganic 

10 substances . 

The fundamental principle of the invention is one of 
forced evolution . In nature, evolution results from the 
combination of genetic variation, selection for advantageous 
traits, and reproduction of the selected individuals, thereby 

15 enriching the population for the trait. The present invention 
achieves genetic variation through controlled random 
mutagenesis ( " variegation " ) of DNA, yielding a mixture of DNA 
molecules encoding different but related potential binding 
proteins- It selects for mutated genes that specify novel 

2 0 proteins with desirable binding properties by 1) arranging 
that the product of each mutated gene be displayed on the 
outer surface of a replicable genetic package (GP) (a cell, 
spore or virus) that contains the gene, and 2) using affinity 
selection - - selection for binding to the target material -- to 

25 enrich the population of packages for those packages 

containing genes specifying proteins with improved binding to 
that target material . Finally, enrichment is achieved by 
allowing only the genetic packages which, by virtue of the 
displayed protein, bound to the target, to reproduce. The 

30 evolution is "forced" in that selection is for the target 
material provided. 
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The display strategy is first perfected by modifying a 
genetic package to display a stable, structured domain (the 
" initial potential binding domain " , IPBD) for which an 
affinity molecule (which may be an antibody) is obtainable. 
5 The success of the modifications is readily measured by, e . g . , 
determining whether the modified genetic package binds to the 
affinity molecule. 

The IPBD is chosen with a view to its tolerance for 
extensive mutagenesis. Once it is known that the IPBD can be 

10 displayed on a surface of a package and subjected to affinity 
selection, the gene encoding the IPBD is subjected to a 
special pattern of multiple mutagenesis, here termed 
"variegation", which after appropriate cloning and 
amplification steps leads to the production of a population of 

15 genetic packages each of which displays a single potential 

binding domain (a mutant of the IPBD) , but which collectively 
display a multitude of different though structurally related 
potential binding domains (PBDs) . Each genetic package 
carries the version of the pbd gene that encodes the PBD 

2 0 displayed on the surface of that particular package. Affinity 

selection is then used to identify the genetic packages 
bearing the PBDs with the desired binding characteristics, and 
these genetic packages may then be amplified. After one or 
more cycles of enrichment by affinity selection and 
25 amplification, the DNA encoding the successful binding domains 
(SBDs) may then be recovered from selected packages. 

If need be, the DNA from the SBD-bearing packages may 
then be further "variegated" , using an SBD of the last round 
of variegation as the "parental potential binding domain" 

3 0 (PPBD) to the next generation of PBDs, and the process 

continued until the worker in the art is satisfied with the 



result. At that point, the SBD may be produced by any 
conventional means, including chemical synthesis. 

When the number of different amino acid sequences 
obtainable by mutation of the domain is large when compared to 
the number of different domains which are displayable in 
detectable amounts, the efficiency of the forced evolution is 
greatly enhanced by careful choice of which residues are to be 
varied. First, residues of a known protein which are likely 
to affect its binding activity ( e.g. , surface residues) and 
not likely to unduly degrade its stability are identified. 
Then all or some of the codons encoding these residues are 
varied simultaneously to produce a variegated population of 
DNA. The variegated population of DNA is used to express a 
variety of potential binding domains, whose ability to bind 
the target of interest may then be evaluated. 

The method of the present invention is thus further 
distinguished from other methods in the nature of the highly 
variegated population that is produced and from which novel 
binding proteins are selected. We force the displayed 
potential binding domain to sample the nearby "sequence space" 
of related amino-acid sequences in an efficient, organized 
manner. Four goals guide the various variegation plans used 
herein, preferably: 1) a very large number ( e.g. 10 7 ) of 
variants is available, 2) a very high percentage of the 
possible variants actually appears in detectable amounts, 3) 
the frequency of appearance of the desired variants is 
relatively uniform, and 4) variation occurs only at a limited 
number of amino-acid residues, most preferably at residues 
having side groups directed toward a common region on the 
surface of the potential binding domain. 

This is to be distinguished from the simple use of 
indiscriminate mutagenic agents such as radiation and 
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hydroxy 1 amine to modify a gene, where there is no (or very 
oblique) control over the site of mutation. Many of the 
mutations will affect residues that are not a part of the 
binding domain. Moreover, since at a reasonable level of 
5 mutagenesis, any modified codon is likely to be characterized 
by a single base change, only a limited and biased range of 
possibilities will be explored. Equally remote is the use of 
site- specif ic mutagenesis techniques employing mutagenic 
oligonucleotides of nonrandomized sequence, since these 
10 techniques do not lend themselves to the production and 

testing of a large number of variants. While focused random 
mutagenesis techniques are known, the importance of 
controlling the distribution of variation has been largely 
overlooked . 

15 In order to obtain the display of a multitude of 

different though related potential binding domains, applicants 
generate a heterogeneous population of replicable genetic 
packages each of which comprises a hybrid gene including a 
first DNA sequence which encodes a potential binding domain 

2 0 for the target of interest and a second DNA sequence which 

encodes a display means, such as an outer surface protein 
native to the genetic package but not natively associated with 
the potential binding domain (or the parental binding domain 
to which it is related) which causes the genetic package to 
25 display the corresponding chimeric protein (or a processed 
form thereof) on its outer surface. 

It should be recognized that by expressing a hybrid 
protein which comprises an outer surface transport signal not 
natively associated with the binding domain, the utility of 

3 0 the present invention is greatly extended. The binding domain 

need not be that of a surface protein of the genetic package 
(or, in the case of a viral package, of its host cell) , since 
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the provided outer surface transport signal is responsible for 
achieving the desired display. Thus, it is possible to 
display on the surface of a phage, bacterial cell or bacterial 
spore a binding domain related to the binding domain of a 
5 normally cytoplasmic binding protein, or the binding domain of 
eukaryotic protein which is not found on the surface of 
prokaryotic cells or viruses. 

Another important aspect of the invention is that each 
potential binding domain remains physically associated with 

10 the particular DNA molecule which encodes it. Thus, once 
successful binding domains are identified, one may readily 
recover the gene and either express additional quantities of 
the novel binding protein or further mutate the gene . The 
form that this association takes is a "replicable genetic 

15 package", a virus, cell or spore which replicates and 

expresses the binding domain- encoding gene, and transports the 
binding domain to its outer surface. 

It is also possible chemically or enzymat ically to modify 
the PBDs before selection. The selection then identifies the 

20 best modified amino acid sequence. For example, we could 
treat the variegated population of genetic packages that 
display a variegated population of binding domains with a 
protein tyrosine kinase and then select for binding the 
target. Any tyrosines on the BD surface will be 

25 phosphorylated and this could affect the binding properties. 
Other chemical or enzymatic modifications are possible. 

By virtue of the present invention, proteins are obtained 
which can bind specifically to targets other than the antigen- 
combining sites of antibodies. A protein is not to be 

3 0 considered a "binding protein" merely because it can be bound 
by an antibody (see definition of "binding protein" which 
follows) . While almost any amino acid sequence of more than 
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about 6-8 amino acids is likely, when linked to an immunogenic 
carrier, to elicit an immune response, any given random 
polypeptide is unlikely to satisfy the stringent definition of 
"binding protein" with respect to minimum affinity and 
5 specificity for its substrate. It is only by testing numerous 
random polypeptides simultaneously (and, in the usual case, 
controlling the extent and character of the sequence 
variation, i.e. , limiting it to residues of a potential 
binding domain having a stable structure, the residues being 
10 chosen as more likely to affect binding than stability) that 
this obstacle is overcome. 

In one embodiment, the invention relates to: 

a) preparing a variegated population of replicable genetic 
packages, each package including a nucleic acid construct 

15 coding for an outer- surf ace-displayed potential binding 

protein other than an antibody, comprising (i) a 
structural signal directing the display of the protein 
(or a processed form thereof) on the outer surface of the 
package and (ii) a potential binding domain for binding 

2 0 said target, where the population collectively displays a 

multitude of different potential binding domains having a 
substantially predetermined range of variation in 
sequence, 

b) causing the expression of said protein and the display of 
25 said protein on the outer surface of such packages, 

c) contacting the packages with target material, other than 
an antibody with an exposed antigen- combining site, so 
that the potential binding domains of the proteins and 
the target material may interact, and separating packages 

30 bearing a potential binding domain that succeeds in 

binding the target material from packages that do not so 
bind, 



d) recovering and replicating at least one package bearing a 
successful binding domain, 

e) determining the amino acid sequence of the successful 
binding domain of a genetic package which bound to the 
target material, 

f) preparing a new variegated population of replicable 
genetic packages according to step (a) , the parental 
potential binding domain for the potential binding 
domains of said new packages being a successful binding 
domain whose sequence was determined in step (e) , and 
repeating steps (b) - (e) with said new population, and, 
when a package bearing a binding domain of desired 
binding characteristics is obtained, 

g) abstracting the DNA encoding the desired binding domain 
from the genetic package and placing it into a suitable 
expression system. (The binding domain may then be 
expressed as a unitary protein, or as a domain of a 
larger protein) . 

The invention is not, however, limited to proteins with a 
single BD since the method may be applied to any or all of the 
BDs of the protein, sequentially or simultaneously. The 
invention is not, however, limited to biological synthesis of 
the binding domains; peptides having an amino-acid sequence 
determined by the isolated DNA can be chemically synthesized. 

The invention further relates to a variegated population 
of genetic packages. Said population may be used by one user 
to select for binding to a first target, by a second user to 
select for binding to a second target, and so on, as the 
present invention does not require that the initial potential 
binding domain actually bind to the target of interest, and 
the variegation is at residues likely to affect binding. The 
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invention also relates to the variegated DNA used in preparing 
such genetic packages. 

The invention likewise encompasses the procedure by which 
the display strategy is verified. The genetic packages are 
5 engineered to display a single IPBD sequence. (Variability 
may be introduced into DNA subsequences adjacent to the ipbd 
subsequence and within the osp-ipbd gene so that the IPBD will 
appear on the GP surface.) A molecule, such as an antibody, 
having high affinity for correctly folded IPBD is used to: a) 

10 detect IPBD on the GP surface, b) screen colonies for display 
of IPBD on the GP surface, or c) select GPs that display IPBD 
from a population, some members of which might display IPBD on 
the GP surface. In one preferred embodiment, this 
verification process (part I) involves: 

15 1) choosing a GP such as a bacterial cell, bacterial spore, 

or phage, having a suitable outer surface protein (OSP) , 

2) choosing a stable IPBD, 

3) designing an amino acid sequence that: a) includes the 
IPBD as a subsequence and b) will cause the IPBD to 

2 0 appear on the GP surface, 

4) engineering a gene, denoted osp-ipbd , that: a) codes for 
the designed animo acid sequence, b) provides the 
necessary genetic regulation, and c) introduces 
convenient sites for genetic manipulation, 

2 5 5) cloning the osp- ipbd gene into the GP, and 

6) harvesting the transformed GPs and testing them for 
presence of IPBD on the GP surface; this test is 
performed with an affinity molecule having high affinity 
for IPBD, denoted Af M (IPBD) . 

3 0 Once a GP(IPBD) is produced, it can be used many times as 

the starting point for developing different novel proteins 
that bind to a variety of different targets. The knowledge of 
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how we engineer the appearance of one IPBD on the surface of a 
GP can be used to design and produce other GP(IPBD)s that 
display different IPBDs. 

Knowing that a particular genetic package and osp-ipbd 
5 fusion are suitable for the practice of the invention, we may 
variegate the genetic packages and select for binding to a 
target of interest. Using IPBD as the PPBD to the first cycle 
of variegation, we prepare a wide variety of osp-pbd genes 
that encode a wide variety of PBDs. We use an affinity 

10 separation to enrich the population of GP (vgPBD) s for GPs that 
display PBDs with binding properties relative to the target 
that are superior to the binding properties of the PPBD. An 
SBD selected from one variegation cycle becomes the PPBD to 
the next variegation cycle. In a preferred embodiment, Part 

15 II of the process of the present invention involves: 

1) picking a target molecule, and an affinity separation 
system which selects for proteins having an affinity for 
that target molecule, 

2) picking a GP(IPBD), 

20 3) picking a set of several residues in the PPBD to vary; 

the principal indicators of which residues to vary 
include: a) the 3D structure of the IPBD, b) sequences of 
homologous proteins, and c) computer or theoretical 
modeling that indicates which residues can tolerate 

25 different amino acids without disrupting the underlying 

structure, 

4) picking a subset of the residues picked in Part 1 1. 3, to 
be varied simultaneously; the principal considerations 
are the number of different variants and which variants 

30 are within the detection capabilities of the affinity 

separation system, and setting the range of variation; 

5) implementing the variegation by: 



a) synthesizing the part of the osp-pbd gene that 
encodes the residues to be varied using a specific 
mixture of nucleotide substrates for some or all of 
the bases encoding residues slated for variation, 

5 thereby creating a population of DNA molecules, 

denoted vgDNA, 

b) ligating this vgDNA, by standard methods, into the 
operative cloning vector (OCV) ( e.g. a plasmid or 
bacteriophage) , 

10 c) using the ligated DNA to transform cells, thereby 

producing a population of transformed cells, 
d) culturing ( i.e. increasing in number) the population 
of transformed cells and harvesting the population of 
GP(PBD)s, said population being denoted as GP (vgPBD) , 

15 e) enriching the population for GPs that bind the target 

by using affinity separation, with the chosen target 
molecule as affinity molecule, 

f) repeating steps II. 5. d and II. 5. e until a GP(SBD) 
having improved binding to the target is isolated, 

2 0 and 

g) testing the isolated SBD or SBDs for affinity and 
specificity for the chosen target, 

6) repeating steps II. 3, II. 4, and II. 5 until the desired 
degree of binding is obtained. 
25 Part II is repeated for each new target material. Part I 

need be repeated only if no GP(IPBD) suitable to a chosen 
target is available. 

For each target, there are a large number of SBDs that 
may be found by the method of the present invention. The 
30 process relies on a combination of protein structural 

considerations, probabilities, and targeted mutations with 
accumulation of information. To increase the probability that 



some PBD in the population will bind to the target, we 
generate as large a population as we can conveniently subject 
to selection- through-binding in one experiment. Key questions 
in management of the method are "How many trans f ormants can we 
produce?", and "How small a component can we find through 
selection-through-binding? 11 . The optimum level of variegation 
is determined by the maximum number of transf ormants and the 
selection sensitivity, so that for any reasonable sensitivity 
we may use a progressive process to obtain a series of 
proteins with higher and higher affinity for the chosen target 
material . 

The appended claims are hereby incorporated by reference 
into this specification as an enumeration of the preferred 
embodiments . 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 shows how a phage may be used as a genetic package. 

At (a) we have a wild- type precoat protein lodged in the 
lipid bilayer. The signal peptide is in the periplasmic 
space. At (b) , a chimeric precoat protein, with a 
potential binding domain interposed between the signal 
peptide and the mature coat protein sequence, is 
similarly trapped. At (c) and (d) , the signal peptide 
has been cleaved off the wild- type and chimeric proteins, 
respectively, but certain residues of the coat protein 
sequence interact with the lipid bilayer to prevent the 
mature protein from passing entirely into the periplasm. 
At (e) and (f) , mature wild- type and chimeric protein are 
assembled into the coat of a single stranded DNA phage as 
it emerges into the periplasmic space. The phage will 
pass through the outer membrane into the medium where it 
can be recovered and chromatographically evaluated. 
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Figure 2 depicts (a) the optimal stereochemistry of a 

disulfide bond, based on Creighton, "Disulfide Bonds and 
Protein Stability" (CREI88) (the two possible torsion 
angles about the disulfide bond of +90° and -90° are 
5 equally likely) , and (b) the standard geometric 

parameters for the disulfide bond, following Katz and 
Kossiakoff (KATZ86) . The average Car-Car distance is 5-6 
A, and the typical S-S bond length is «2 . 0 A. Many left- 
hand disulfides adopt as a preferred geometry Xl=-60°, 
10 X2 = -60°, X3 = -85°, X2'=-60°, XI 1 =-60° , Car-Co; = 5.88 A; 

right-hand disulfides are more variable. 

Figure 3 shows a mini -protein comprising eight residues 

^CAA^ ) t , numbered 4 through 11 and in 

which residues 5 and 10 are joined by a disulfide. The £ 
15 carbons are labeled for residues 4, 6, 7, 8, 9, and 11; 

these residues are preferred sites of variegation. 

Figure 4 shows the C a of the coat protein of phage f 1 . 

Figure 5 shows the construction of M13-MB51. 

Figure 6 shows construction of MK-BPTI, also known as BPTI-III 
20 MK. 

Figure 7 illustrates fractionation of the Mini PEPI library on 
HNE beads. The abscissae shows pH of buffer. The 
ordinants show amount of phage (as fraction of input 
phage) obtained at given pH. Ordinants scaled by 10 3 . 

25 Figure 8 illustrates fractionation of the MYMUT PEPI library 
on HNE beads. The abscissae shows pH of buffer. The 
ordinants show amount of phage (as fraction of input 
phage) obtained at given pH. Ordinants scaled by 103. 
Figure 9 shows the elution profiles for EpiNE clones 1, 3, and 

30 7. Each profile is scaled so that the peak is 1.0 to 

emphasize the shape of the curve. 
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Figure 10 shows pH profile for the binding of BPTI-III MK and 
EpiNEl on cathepsin G beads. The abscissae shows pH of 
buffer. The ordinants show amount of phage (as fraction 
of input phage) obtained at given pH. Ordinants scaled 
5 by 103. 

Figure 11 shows pH profile for the f raxctionation of the MYMUT 
Library on cathepsin G beads. The abscissae shows pH of 
buffer. The ordinants show amount of phage (as fraction 
of input phage) obtained at given pH. Ordinants scaled 
10 by 103. 

Figure 12 shows a second fractionation of MYMUT library over 
cathepsin G. 

Figure 13 shows elution profiles on immobilized cathepsin G 
for phage selected for binding to cathepsin G. 
15 Figure 14 shows the C<xs of BPTI and interaction set #2. 

Figure 15 shows the main chain of scorpion toxin (Brookhaven 

Protein Data Bank entry 1SN3) residues 2 0 through 42 (S Eg 
ID ||| , ^ , 1 111,^! ^ , - CYS 25 and CYS41 are shown forming a disulfide. 
In the native protein these groups form disulfides to 
20 other cysteines, but no main-chain motion is required to 

bring the gamma sulphurs into acceptable geometry. 
Residues, other than GLY, are labeled at the £ carbon 
with the one-letter code. 
Figure 16 shows profiles of the elustion of phage that display 
25 EpiNE7 and EpiNE7.23 from HNE beads. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 
OVERVIEW 

I . DEFINITIONS AND ABBREVIATIONS 

II. THE INITIAL POTENTIAL BINDING DOMAIN 
30 A. Generally 

B. Influence of Target Size on Choice of IPBD 

C. Influence of Target Charge on Choice of IPBD 
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D. Other Considerations in the Choice of IPBD 

E. Bovine Pancreatic Trypsin Inhibitor (BPTI) as an 
IPBD 

F. Mini -Proteins as IPBDs 
5 G . Modified PBDs 

III. VARIEGATION STRATEGY - MUTAGENESIS TO OBTAIN POTENTIAL 
BINDING DOMAINS WITH DESIRED DIVERSITY 

A. Generally 

B. Identification of Residues to be Varied 

10 C. Determining the Substitution Set for Each Parental 

Residue 

D. Special Considerations Relating to Variegation of 
Mini-Proteins with Essential Cysteines 

E. Planning the Second and Later Rounds of Variegation 
15 IV. DISPLAY STRATEGY - DISPLAYING FOREIGN BINDING DOMAINS ON 

THE SURFACE OF A "GENETIC PACKAGE" 

A. General Requirements for Genetic Package 

B. Phages for Use as Genetic Packages 

C. Bacterial Cells as Genetic Packages 

2 0 D. Bacterial Spores as Genetic Packages 

E. Artificial Outer Surface Protein 

F. Designing the osp::ipbd Gene Insert 

G. Synthesis of Gene Inserts 

H. Operative Cloning Vector 
25 I. Transformation of Cells 

J. Verification of Display Strategy 
K. Analysis and Correction of Display Problems 
V. AFFINITY SELECTION OF TARGET- BINDING MUTANTS 

A. Affinity Separation Technology, Generally 

3 0 B. Affinity Chromatography, Generally 

C. Fluorescent -Activated Cell Sorting, Generally 

D. Affinity Electrophoresis, Generally 
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E. Target Materials 

F. Immobilization or Labeling of Target Material 

G. Elution of Lower Affinity PBD-Bearing Packages 

H. Optimization of Affinity Separation 

5 I . Measuring the Sensitivity of Affinity Separation 

J. Measuring the Efficiency of Separation 
K. Reducing Selection due to Non-Specific Binding 
L. Isolation of Genetic Package PBDs with Binding- to- 
Target Phenotypes 
10 M. Recovery of Packages 

N. Amplifying the Enriched Packages 

O. Determining Whether Further Enrichment is Needed 
P. Characterizing the Putative SBDs 
Q. Joint Selections 
15 R. Selection for Non-Binding 

S. Selection of Potential Binding Domains for Retention 

of Structure 
T. Engineering of Antagonists 
VI. EXPLOITATION OF SUCCESSFUL BINDING DOMAINS AND 
2 0 CORRESPONDING DNAS 

A. Generally 

B. Production of Novel Binding Proteins 

C. Mini-Protein Production 

D. Uses of Novel Binding Proteins 
2 5 VII. EXAMPLES 

I. DEFINITIONS AND ABBREVIATIONS 

Let Ka (x,y) be a dissociation constant, 
[x] [y] 



K d (x,y) = 



[x :y] 



3 0 For the purposes of the appended claims, a protein P is a 
binding protein if (1) For one molecular, ionic or atomic 
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species A, other than the variable domain of an antibody, the 
dissociation constant K D (P,A) < 10" 6 moles/liter (preferably, 
< 10" 7 moles/liter) , and (2) for a different molecular, ionic 
or atomic species B, K D (P,B) > 10" 4 moles/liter (preferably, > 
5 10" 1 moles/liter) . As a result of these two conditions, the 
protein P exhibits specificity for A over B, and a minimum 
degree of affinity (or avidity) for A. 

The exclusion of "variable domain of an antibody" in (1) 
above is intended to make clear that for the purposes herein a 

10 protein is not to be considered a "binding protein" merely 

because it is antigenic. However, an antigen may nonetheless 
qualify as a binding protein because it specifically binds to 
a substance other than an antibody, e.g. , an enzyme for its 
substrate, or a hormone for its cellular receptor. 

15 Additionally, it should be pointed out that "binding protein" 
may include a protein which binds specifically to the Fc of an 
antibody, e.g. , staphylococcal protein A. 

Normally, the binding protein will not be an antibody or 
a antigen-binding derivative thereof. An antibody is a 

2 0 crosslinked complex of four polypeptides (two heavy and two 
light chains) . The light chains of IgG have a molecular 
weight of «23,000 daltons and the heavy chains of «53,000 
daltons. A single binding unit is composed of the variable 
region of a heavy chain (V H ) and the variable region of a light 

25 chain (V L ) , each about 110 amino-acid residues. The V H and V L 
regions are held in proximity by a disulfide bond between the 
adjoining C L and C H i regions; altogether, these total 440 
residues and correspond to an Fab fragment. Derivatives of 
antibodies include Fab fragments and the individual variable 

30 light and heavy domains. A special case of antibody 

derivative is a "single chain antibody. 11 A "single-chain 
antibody" is a single chain polypeptide comprising at least 
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2 00 amino acids, said amino acids forming two antigen-binding 
regions connected by a peptide linker that allows the two 
regions to fold together to bind the antigen in a manner akin 
to that of an Fab fragment. Either the two antigen-binding 
5 regions must be variable domains of known antibodies, or they 
must (1) each fold into a S barrel of nine strands that are 
spatially related in the same way as are the nine strands of 
known antibody variable light or heavy domains, and (2) fit 
together in the same way as do the variable domains of said 

10 known antibody. Generally speaking, this will require that, 
with the exception of the amino acids corresponding to the 
hypervariable region, there is at least 88% homology with the 
amino acids of the variable domain of a known antibody. 

While the present invention may be used to develop novel 

15 antibodies through variegation of codons corresponding to the 
hypervariable region of an antibody 1 s variable domain, its 
primary utility resides in the development of binding proteins 
which are not antibodies or even variable domains of 
antibodies. Novel antibodies can be obtained by immunological 

2 0 techniques; novel enzymes, hormones, etc . cannot. 

It will be appreciated that, as a result of evolution, 
the antigen-binding domains of antibodies have acquired a 
structure which tolerates great variability of sequence in the 
hypervariable regions. The remainder of the variable domain 

25 is made up of constant regions forming a distinctive 
structure, a nine strand 6 barrel, which hold the 
hypervariable regions (inter- strand loops) in a fixed 
relationship with each other. Most other binding proteins 
lack this molecular design which facilitates diversification 

30 of binding characteristics. Consequently, the successful 

development of novel antibodies by modification of sequences 
encoding known hypervariable regions- -which, in nature, vary 
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from antibody to antibody- -does not provide any guidance or 
assurance of success in the development of novel, non- 
immunoglobulin binding proteins. 

It should further be noted that the affinity of 
5 antibodies for their target epitopes is typically on the order 
of 10 6 to 10 10 liters/mole; many enzymes exhibit much greater 
affinities (10 9 to 10 15 liters/mole) for their preferred 
substrates. Thus, if the goal is to develop a binding protein 
with a very high affinity for a target of interest, e.g. , 

10 greater than 10 10 , the antibody design may in fact be unduly 
limiting . Furthermore , the complementarity-determining 
residues of an antibody comprises many residues, 30 to 50. In 
most cases, it is not known which of these residues 
participates directly in binding antigen. Thus, picking an 

15 antibody as PPBD does not allow us to focus variegation to a 
small number of residues. 

Most larger proteins fold into distinguishable globules 
called domains (R0SS81) . Protein domains have been defined 
various ways, but all definitions fall into one of three 

20 classes: a) those that define a domain in terms of 3D atomic 
coordinates, b) those that define a domain as an isolable, 
stable fragment of a larger protein, and c) those that define 
a domain based on protein sequence homology plus a method from 
class a) or b) . Frequently, different methods of defining 

25 domains applied to a single protein yield identical or very 
similar domain boundaries. The diversity of definitions for 
domains stems from the many ways that protein domains are 
perceived to be important, including the concept of domains in 
predicting the boundaries of stable fragments, and the 

30 relationship of domains to protein folding, function, 

stability, and evolution. The present invention emphasizes 
the retention of the structured character of a domain even 
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though its surface residues are mutated. Consequently, 
definitions of "domain" which emphasize stability -- retention 
of the overall structure in the face of perturbing forces such 
as elevated temperatures or chaotropic agents -- are favored, 
5 though atomic coordinates and protein sequence homology are 
not completely ignored. 

When a domain of a protein is primarily responsible for 
the protein's ability to specifically bind a chosen target, it 
is referred to herein as a "binding domain" (BD) . A 

10 preliminary operation is to engineer the appearance of a 
stable protein domain, denoted as an "initial potential 
binding domain" (IPBD) , on the surface of a genetic package. 

The term "variegated DNA" (vgDNA) refers to a mixture of 
DNA molecules of the same or similar length which, when 

15 aligned, vary at some codons so as to encode at each such 

codon a plurality of different amino acids, but which encode 
only a single amino acid at other codon positions. It is 
further understood that in variegated DNA, the codons which 
are variable, and the range and frequency of occurrence of the 

2 0 different amino acids which a given variable codon encodes, 

are determined in advance by the synthesizer of the DNA, even 
though the synthetic method does not allow one to know, a 
priori, the sequence of any individual DNA molecule in the 
mixture. The number of designated variable codons in the 

2 5 variegated DNA is preferably no more than 2 0 codons, and more 

preferably no more than 5-10 codons. The mix of amino acids 
encoded at each variable codon may differ from codon to codon. 

A population of genetic packages into which variegated 
DNA has been introduced is likewise said to be "variegated" . 

3 0 For the purposes of this invention, the term "potential 

binding protein" refers to a protein encoded by one species of 
DNA molecule in a population of variegated DNA wherein the 
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region of variation appears in one or more subsequences 
encoding one or more segments of the polypeptide having the 
potential of serving as a binding domain for the target 
substance . 

5 From time to time, it may be helpful to speak of the 

"parent sequence" of the variegated DNA. When the novel 
binding domain sought is an analogue of a known binding 
domain, the parent sequence is the sequence that encodes the 
known binding domain. The variegated DNA will be identical 

10 with this parent sequence at one or more loci, but will 

diverge from it at chosen loci. When a potential binding 
domain is designed from first principles, the parent sequence 
is a sequence which encodes the amino acid sequence that has 
been predicted to form the desired binding domain, and the 

15 variegated DNA is a population of "daughter DNAs" that are 

related to that parent by a recognizable sequence similarity. 

A "chimeric protein" is a protein composed of a first 
amino acid sequence substantially corresponding to the 
sequence of a protein or to a large fragment of a protein (2 0 

2 0 or more residues) expressed by the species in which the 

chimeric protein is expressed and a second amino acid sequence 
that does not substantially correspond to an amino acid 
sequence of a protein expressed by the first species but that 
does substantially correspond to the sequence of a protein 
25 expressed by a second and different species of organism. The 
second sequence is said to be foreign to the first sequence. 

One amino acid sequence of the chimeric proteins of the 
present invention is typically derived from an outer surface 
protein of a "genetic package" as hereafter defined. The 

3 0 second amino acid sequence is one which, if expressed alone, 

would have the characteristics of a protein (or a domain 
thereof) but is incorporated into the chimeric protein as a 



recognizable domain thereof. It may appear at the amino or 
carboxy terminal of the first amino acid sequence (with or 
without an intervening spacer) , or it may interrupt the first 
amino acid sequence. The first amino acid sequence may 
correspond exactly to a surface protein of the genetic 
package, or it may be modified, e.g. , to facilitate the 
display of the binding domain. 

In the present invention, the words "select" and 
"selection" are used in the genetic sense; i.e. a biological 
process whereby a phenotypic characteristic is used to enrich 
a population for those organisms displaying the desired 
phenotype . 

One affinity separation is called a "separation cycle"; 
one pass of variegation followed by as many separation cycles 
as are needed to isolate an SBD, is called a "variegation 
cycle" . The amino acid sequence of one SBD from one round 
becomes the PPBD to the next variegation cycle. We perform 
variegation cycles iteratively until the desired affinity and 
specificity of binding between an SBD and chosen target are 
achieved. 

The following abbreviations will be used throughout the 
present specification : 



Abbr evi a t i on 



Meaning 



GP 



Genetic Package, e.g. a 



bacteriophage 



wtGP 



x 



X 



BD 



BPTI 



Wild-type GP 

Any protein 

The gene for protein X 

Binding Domain 

Bovine pancreatic trypsin 



inhibitor, identical to 



aprotinin (Merck Index, 



IPBD 

PBD 

SBD 

PPBD 

OSP 

OSP-PBD 

OSTS 
GP (x) 
GP (X) 

GP ( osp-pbd ) 
GP (OSP-PBD) 
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entry 784, p.H9(SEQ ID 
NO: 44) ) 

Initial Potential Binding 
Domain, e.g. BPTI 
Potential Binding Domain, 
e.g. a derivative of BPTI 
Successful Binding Domain, 
e.g. a derivative of BPTI 
selected for binding to a 
target 

Parental Potential Binding 
Domain, i.e. an IPBD or an 
SBD from a previous 
selection 

Outer Surface Protein, 
e.g. coat protein of a 
phage or LamB from coli 
Fusion of an OSP and a 
PBD, order of fusion not 
specified 

Outer Surface Transport 
Signal 

A genetic package 
containing the x gene 
A genetic package that 
displays X on its outer 
surface 

GP containing an osp-pbd 
gene 

A genetic package that 
displays PBD on its 
outside as a fusion to OSP 
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GP ( pbd ) 
GP (PBD) 

{Q} 



AfM(W) 



AfM (W) 



XINDUCE 



OCV 
Kd 



DoAMoM 

mfaa 
Ifaa 
Abun (x) 



GP containing a pbd gene, 
osp implicit 
A genetic package 
displaying PBD on its 
outside, OSP unspecified 
An affinity matrix 
supporting "Q" , e.g. {T4 
lysozyme} is T4 lysozyme 
attached to an affinity 
matrix 

A molecule having affinity 
for "W", e.g. trypsin is 
an AfM(BPTI) 

AfM (W) carrying a label, 

A „ 125 T 

e.g. I 

A chemical that can induce 
expression of a gene, e.g. 
IPTG for the lacUVS 
promoter 

Operative Cloning Vector 
A bimolecular dissociation 
constant, Kd = 
[A] [B]/[A:B] 

K T = [T] [SBD] / [T:SBD] (T 
is a target) 

K N - [N] [SBD] / [N : SBD] (N 

is a non- target) 

Density of AfM (W) on 

affinity matrix 

Most -Favored amino acid 

Least -Favored amino acid 

Abundance of DNA molecules 



encoding amino acid x 
Outer membrane protein 
nucleotide 

Signal -sequence Peptidase 
I 

Yield of ssDNA up to Q 
bases long 

Maximum length of ssDNA 

that can be synthesized in 

acceptable yield 

Yield of plasmid DNA per 

volume of culture 

DNA ligation efficiency 

Maximum number of 

t ransf ormant s produced 

from Y D ioo DNA of Insert 

Efficiency of 

chromatographic 

enrichment, enrichment per 

pass 

Sensitivity of 
chromatographic 
separation, can find 1 in 
N, 

Maximum number of 
enrichment cycles per 
variegation cycle 
Error level in 
synthesizing vgDNA 
in- frame genetic fusion or 
protein produced from in- 
frame fused gene 
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Single- letter codes for amino acids and nucleotides are given 
in Table 1 . 

5 *** 

II. THE INITIAL POTENTIAL BINDING DOMAIN (IPBD) : 

II .A. Generally 

The initial potential binding domain may be: 1) a domain 
of a naturally occurring protein, 2) a non-naturally occurring 

10 domain which substantially corresponds in sequence to a 
naturally occurring domain, but which differs from it in 
sequence by one or more substitutions, insertions or 
deletions, 3) a domain substantially corresponding in sequence 
to a hybrid of subsequences of two or more naturally occurring 

15 proteins, or 4) an artificial domain designed entirely on 
theoretical grounds based on knowledge of amino acid 
geometries and statistical evidence of secondary structure 
preferences of amino acids. (However, the limitations of a 
priori protein design prompted the present invention.) 

20 Usually, the domain will be a known binding domain, or at 
least a homologue thereof, but it may be derived from a 
protein which, while not possessing a known binding activity, 
possesses a secondary or higher structure that lends itself to 
binding activity (clefts, grooves, etc . ) . The protein to 

25 which the IPBD is related need not have any specific affinity 
for the target material . 

In determining whether sequences should be deemed to 
"substantially correspond", one should consider the following 
issues: the degree of sequence similarity when the sequences 

3 0 are aligned for best fit according to standard algorithms, the 
similarity in the connectivity patterns of any crosslinks 
( e.g. , disulfide bonds) , the degree to which the proteins have 
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similar three-dimensional structures, as indicated by, e .g . , 
X-ray diffraction analysis or NMR, and the degree to which the 
sequenced proteins have similar biological activity. In this 
context, it should be noted that among the serine protease 
5 inhibitors, there are families of proteins recognized to be 
homologous in which there are pairs of members with as little 
as 3 0% sequence homology. 

A candidate IPBD should meet the following criteria: 

1) a domain exists that will remain stable under the 

10 conditions of its intended use (the domain may comprise 

the entire protein that will be inserted, e.g. BPTI (SEQ 
ID NO:44) , Qf-conotoxin GI , or CMTI-III), 

2) knowledge of the amino acid sequence is obtainable, and 

3) a molecule is obtainable having specific and high 
15 affinity for the IPBD, AfM (IPBD) . 

Preferably, in order to guide the variegation strategy, 
knowledge of the identity of the residues on the domain's 
outer surface, and their spatial relationships, is obtainable ; 
however, this consideration is less important if the binding 

2 0 domain is small, e.g. , under 40 residues. 

Preferably, the IPBD is no larger than necessary because 
small SBDs (for example, less than 3 0 amino acids) can be 
chemically synthesized and because it is easier to arrange 
restriction sites in smaller amino-acid sequences. For PBDs 

25 smaller than about 4 0 residues, an added advantage is that the 
entire variegated pbd gene can be synthesized in one piece. 
In that case, we need arrange only suitable restriction sites 
in the osp gene. A smaller protein minimizes the metabolic 
strain on the GP or the host of the GP. The IPBD is 

30 preferably smaller than about 200 residues. The IPBD must 

also be large enough to have acceptable binding affinity and 
specificity. For an IPBD lacking covalent crosslinks, such as 
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disulfide bonds , the IPBD is preferably at least 40 residues ; 
it may be as small as six residues if it contains a crosslink. 
These small, crosslinked IPBDs, known as "mini -proteins " # are 
discussed in more detail later in this section. 
5 Some candidate IPBDs, which meet the conditions set forth 

above, will be more suitable than others. Information about 
candidate IPBDs that will be used to judge the suitability of 
the IPBD includes: 1) a 3D structure (knowledge strongly 
preferred) , 2) one or more sequences homologous to the IPBD 

10 (the more homologous sequences known, the better) , 3) the pi 
of the IPBD (knowledge desirable when target is highly 
charged) , 4) the stability and solubility as a function of 
temperature, pH and ionic strength (preferably known to be 
stable over a wide range and soluble in conditions of intended 

15 use) , 5) ability to bind metal ions such as Ca ++ or Mg ++ 
(knowledge preferred; binding per se , no preference) , 6) 
enzymatic activities, if any (knowledge preferred, activity 
per se has uses but may cause problems) , 7) binding 
properties, if any (knowledge preferred, specific binding also 

20 preferred) , 8) availability of a molecule having specific and 
strong affinity (Ka < 10" 11 M) for the IPBD (preferred) , 9) 
availability of a molecule having specific and medium affinity 
(10~ 8 M < Ka < 10" 6 M) for the IPBD (preferred) , 10) the 
sequence of a mutant of IPBD that does not bind to the 

25 affinity molecule (s) (preferred), and 11) absorption spectrum 
in visible, UV, NMR, etc . (characteristic absorption 
preferred) . 

If only one species of molecule having affinity for IPBD 
(Af M (IPBD) ) is available, it will be used to: a) detect the 
30 IPBD on the GP surface, b) optimize expression level and 
density of the affinity molecule on the matrix, and c) 
determine the efficiency and sensitivity of the affinity 



separation. As noted above, however, one would prefer to have 
available two species of AfM(IPBD), one with high and one with 
moderate affinity for the IPBD. The species with high 
affinity would be used in initial detection and in determining 
5 efficiency and sensitivity, and the species with moderate 
affinity would be used in optimization. 

If the IPBD is not itself a binding domain of a known 
binding protein, or if its native target has not been 
purified, an antibody raised against the IPBD may be used as 
10 the affinity molecule. Use of an antibody for this purpose 

should not be taken to mean that the antibody is the ultimate 
target . 

There are many candidate IPBDs for which all of the above 
information is available or is reasonably practical to obtain, 

15 for example, bovine pancreatic trypsin inhibitor (BPTI, 58 
residues) , CMTI-III (29 residues) , crambin (46 residues) , 
third domain of ovomucoid (56 residues) , heat-stable 
enterotoxin (ST- la of coli ) (18 residues) , Qf-Conotoxin GI 
(13 residues) , ^t-Conotoxin GUI (22 residues) , Conus King Kong 

20 mini-protein (27 residues) , T4 lysozyme (164 residues) , and 

azurin (128 residues) . Structural information can be obtained 
from X-ray or neutron diffraction studies, NMR, chemical cross 
linking or labeling, modeling from known structures of related 
proteins, or from theoretical calculations. 3D structural 

25 information obtained by X-ray diffraction, neutron diffraction 
or NMR is preferred because these methods allow localization 
of almost all of the atoms to within defined limits. Table 50 
lists several preferred IPBDs. Works related to determination 
of 3D structure of small proteins via NMR inculde: CHAZ85, 

30 PEAS 90 , PEAS 8 8 , CLOR86, CLOR87a, HEIT89, LEC087, WAGN79, and 
PARD8 9 . 
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In some cases, a protein having some affinity for the 
target may be a preferred IPBD even though some other criteria 
are not optimally met. For example, the VI domain of CD4 is a 
good choice as IPBD for a protein that binds to gpl20 of HIV. 
5 It is known that mutations in the region 42 to 55 of VI 

greatly affect gpl2 0 binding and that other mutations either 
have much less effect or completely disrupt the structure of 
VI. Similarly, tumor necrosis factor (TNF) would be a good 
initial choice if one wants a TNF-like molecule having higher 

10 affinity for the TNF receptor. 

Membrane -bound proteins are not preferred IPBPs, though 
they may serve as a source of outer surface transport signals. 
One should distinguish between membrane -bound proteins, such 
as LamB or OmpF, that cross the membrane several times forming 

15 a structure that is embedded in the lipid bilayer and in which 
the exposed regions are the loops that join trans -membrane 
segments, from non-embedded proteins, such as the soluble 
domains of CD4 , that are simply anchored to the membrane. 
This is an important distinction because it is quite difficult 

20 to create a soluble derivative of a membrane -bound protein. 
Soluble binding proteins are in general more useful since 
purification is simpler and they are more tractable and more 
versatile assay reagents. 

Most of the PBDs derived from a PPBD according to the 

25 process of the present invention will have been derived by 

variegation at residues having side groups directed toward the 
solvent. Reidhaar-Olson and Sauer (REID88a) found that 
exposed residues can accept a wide range of amino acids, while 
buried residues are more limited in this regard. Surface 

30 mutations typically have only small effects on melting 

temperature of the PBD, but may reduce the stability of the 
PBD. Hence the chosen IPBD should have a high melting 
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temperature (50 °C acceptable, the higher the better; BPTI 
melts at 95 °C.) and be stable over a wide pH range (8.0 to 3.0 
acceptable; 11.0 to 2.0 preferred), so that the SBDs derived 
from the chosen IPBD by mutation and selection-through - 
5 binding will retain sufficient stability. Preferably, the 
substitutions in the IPBD yielding the various PBDs do not 
reduce the melting point of the domain below «40°C. Mutations 
may arise that increase the stability of SBDs relative to the 
IPBD, but the process of the present invention does not depend 

10 upon this occurring. Proteins containing covalent crosslinks, 
such as multiple disulfides, are usually sufficient stable. A 
protein having at least two disulfides and having at least 1 
disulfide per every twenty residues may be presumed to be 
sufficiently stable . 

15 Two general characteristics of the target molecule, size 

and charge, make certain classes of I PBDs more likely than 
other classes to yield derivatives that will bind specifically 
to the target . Because these are very general 

characteristics, one can divide all targets into six classes: 
20 a) large positive, b) large neutral, c) large negative, d) 

small positive, e) small neutral, and f) small negative. A 

small collection of IPBDs, one or a few corresponding to each 

class of target, will contain a preferred candidate IPBD for 

any chosen target . 
25 Alternatively, the user may elect to engineer a GP(IPBD) 

for a particular target; criteria are given below that relate 

target size and charge to the choice of IPBD. 

II . B . Influence of target size on choice of IPBD: 

If the target is a protein or other macromolecule a 
3 0 preferred embodiment of the IPBD is a small protein such as 

the Cucurbit a maxima trypsin inhibitor III (29 residues) , BPTI 

from Bos Taurus (58 residues) , crambin from rape seed (46 
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residues) , or the third domain of ovomucoid from Coturnix 
coturnix Japonica (Japanese quail) (56 residues) , because 
targets from this class have clefts and grooves that can 
accommodate small proteins in highly specific ways. If the 
5 target is a macromolecule lacking a compact structure, such as 
starch, it should be treated as if it were a small molecule. 
Extended macromolecules with defined 3D structure, such as 
collagen, should be treated as large molecules. 

If the target is a small molecule, such as a steroid, a 

10 preferred embodiment of the IPBD is a protein of about 80-200 
residues, such as ribonuclease from Bos taurus (124 residues) , 
ribonuclease from Aspergillus oruzae (104 residues) , hen egg 
white lysozyme from Gallus gallus (12 9 residues) , azurin from 
Pseudomonas aerugenosa (128 residues) , or T4 lysozyme (164 

15 residues) , because such proteins have clefts and grooves into 
which the small target molecules can fit. The Brookhaven 
Protein Data Bank contains 3D structures for all of the 
proteins listed. Genes encoding proteins as large as T4 
lysozyme can be manipulated by standard techniques for the 

2 0 purposes of this invention. 

If the target is a mineral, insoluble in water, one 
considers the nature of the molecular surface of the mineral . 
Minerals that have smooth surfaces, such as crystalline 
silicon, are best addressed with medium to large proteins, 
25 such as ribonuclease, as IPBD in order to have sufficient 

contact area and specificity. Minerals with rough, grooved 
surfaces, such as zeolites, could be bound either by small 
proteins, such as BPTI , or larger proteins, such as T4 
lysozyme . 

3 0 II . C . Influence of target charge on choice of IPBD: 



Electrostatic repulsion between molecules of like charge 
can prevent molecules with highly complementary surfaces from 
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binding. Therefore, it is preferred that, under the 
conditions of intended use, the IPBD and the target molecule 
either have opposite charge or that one of them is neutral. 
In some cases it has been observed that protein molecules bind 
5 in such a way that like charged groups are juxtaposed by 
including oppositely charged counter ions in the molecular 
interface. Thus, inclusion of counter ions can reduce or 
eliminate electrostatic repulsion and the user may elect to 
include ions in the eluants used in the affinity separation 
10 step. Polyvalent ions are more effective at reducing 
repulsion than monovalent ions. 

II .D. Other considerations in the choice of IPBD: 

If the chosen IPBD is an enzyme, it may be necessary to 
change one or more residues in the active site to inactivate 
15 enzyme function. For example, if the IPBD were T4 lysozyme 
and the GP were coli cells or M13, we would need to 
inactivate the lysozyme because otherwise it would lyse the 
cells. If, on the other hand, the GP were <i>X174, then 
inactivation of lysozyme may not be needed because T4 lysozyme 

2 0 can be overproduced inside coli cells without detrimental 

effects and <£X174 forms intracellularly . It is preferred to 
inactivate enzyme IPBDs that might be harmful to the GP or its 
host by substituting mutant amino acids at one or more 
residues of the active site. It is permitted to vary one or 
25 more of the residues that were changed to abolish the original 
enzymatic activity of the IPBD. Those GPs that receive osp- 
pbd genes encoding an active enzyme may die, but the majority 
of sequences will not be deleterious. 

If the binding protein is intended for therapeutic use in 

3 0 humans or animals, the IPBD may be chosen from proteins native 

to the designated recipient to minimize the possibility of 
antigenic reactions . 
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II . E . Bovine Pancreatic Trypsin Inhibitor (BPTI) as an 

IPBD: 

BPTI is an especially preferred IPBD because it meets or 
exceeds all the criteria: it is a small, very stable protein 
5 with a well known 3D structure. Marks et al . (MARK8 6) have 
shown that a fusion of the phoA signal peptide gene fragment 
and DNA coding for the mature form of BPTI caused native BPTI 
to appear in the periplasm of coli , demonstrating that 
there is nothing in the structure of BPTI to prevent its being 
10 secreted. 

The structure of BPTI is maintained even when one or 
another of the disulfides is removed, either by chemical 
blocking or by genetic alteration of the amino-acid sequence. 
The stabilizing influence of the disulfides in BPTI is not 

15 equally distributed. Goldenberg (GOLD8 5) reports that 

blocking CYS14 and CYS38 lowers the Tm of BPTI to ~75°C while 
chemical blocking of either of the other disulfides lowers Tm 
to below 40°C. Chemically blocking a disulfide may lower Tm 
more than mutating the cysteines to other amino-acid types 

2 0 because the bulky blocking groups are more destabilizing than 
removal of the disulfide. Marks et al . (MARK87) replaced both 
CYS14 and CYS38 with either two alanines or two threonines. 
The CYS14/CYS3 8 cystine bridge that Marks et al . removed is 
the one very close to the scissile bond in BPTI ; 

25 surprisingly, both mutant molecules functioned as trypsin 

inhibitors. Schnabel et al . (SCHN86) report preparation of 
aprotinin (C14A, C38A) by use of Raney nickel. Eigenbrot et al . 
(EIGE90) report the X-ray structure of BPTI (C30A/C51A) which 
is stable to at least 50°C. The backbone of this mutant is as 

30 similar to BPTI as are the backbones of BPTI molecules that 
sit in different crystal lattices. This indicates that BPTI 
is redundantly stable and so is likely to fold into 
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approximately the same structure despite numerous surface 
mutations. Using the knowledge of homologues, vide infra , we 
can infer which residues should not be varied if the basic 
BPTI structure is to be maintained. 
5 The 3D structure of BPTI has been determined at high 

resolution by X-ray diffraction (HUBE77, MARQ83, WLOD84 , 
WLOD87a, WLOD87b) , neutron diffraction (WLOD84) , and by NMR 
(WAGN87) . In one of the X-ray structures deposited in the 
Brookhaven Protein Data Bank, entry 6PTI, there was no 

10 electron density for A58, indicating that A58 has no uniquely 
defined conformation. Thus we know that the carboxy group 
does not make any essential interaction in the folded 
structure. The amino terminus of BPTI is very near to the 
carboxy terminus. Goldenberg and Creighton reported on 

15 circularized BPTI and circularly permuted BPTI (GOLD83) . Some 
proteins homologous to BPTI have more or fewer residues at 
either terminus. 

BPTI has been called "the hydrogen atom of protein 
folding" and has been the subject of numerous experimental and 

20 theoretical studies (STAT87, SCHW87, GOLD83, CHAZ83, CREI74, 
CREI77a # CREI77b / CREI80, SIEK87, SINH90, RUEH73 , HUBE74, 
HUBE75, HUBE77 and others) . 

BPTI has the added advantage that at least 59 homologous 
proteins are known. Table 13 shows the sequences of 3 9 

25 homologues. A tally of ionizable groups in 59 homologues is 
shown in Table 14 and the composite of amino acid types 
occurring at each residue is shown in Table 15. 

BPTI is freely soluble and is not known to bind metal 
ions. BPTI has no known enzymatic activity. BPTI is not 

30 toxic. 

All of the conserved residues are buried; of the six 
fully conserved residues only G37 has noticeable exposure. 
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The solvent accessibility of each residue in BPTI is given in 
Table 16 which was calculated from the entry " 6PTI" in the 
Brookhaven Protein Data Bank with a solvent radius of 1.4 A, 
the atomic radii given in Table 7, and the method of Lee and 
5 Richards (LEEB71) . Each of the 52 non-conserved residues can 
accommodate two or more kinds of amino acids. By 
independently substituting at each residue only those amino 
acids already observed at that residue, we could obtain 
approximately 1.6-10 43 different amino acid sequences, most of 
10 which will fold into structures very similar to BPTI. 
BPTI will be especially useful as a IPBD for 
macromolecular targets. BPTI and BPTI homologues bind tightly 
and with high specificity to a number of enzyme 
macromolecules . 

15 BPTI is strongly positively charged except at very high 

pH, thus BPTI is useful as IPBD for targets that are not also 
strongly positive under the conditions of intended use. There 
exist homologues of BPTI, however, having quite different 
charges ( viz . SCI -III from Bombyx mori at -7 and the trypsin 

2 0 inhibitor from bovine colostrum at -1) . Once a genetic 
package is found that displays BPTI on its surface, the 
sequence of the BPTI domain can be replaced by one of the 
homologous sequences to produce acidic or neutral IPBDs. 
BPTI is quite small; if this should cause a 

2 5 pharmacological problem, two or more BPTI -derived domains may 
be joined as in humans BPTI homologues, one of which has two 
domains (BALD85, ALBR83b) and another has three (WUNT88) . 

Another possible pharmacological problem is immun 
igenicity. BPTI has been used in humans with very few adverse 

30 effects. Siekmann et al . (SIEK89) have studied immunological 
characteristics of BPTI and some homologues. It is an 
advantage of the method of the present invention that a 
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variety of SBDs can be obtained so that, if one derivative 
proves to be antigenic, a different SBD may be used. 
Furthermore, one can reduce the probability of immune response 
by starting with a human protein, such as LACI (a BPTI 
homologue) (WUNT88, GIRA89) or Inter-a-Trypsin Inhibitor 
(ALBR83a, ALBR83b, DIAR90, ENGH89, TRIB86, GEBH86, GEBH90, 
KAUM86, ODOM90, SALI90) . 

Further, a BPTI -derived gene fragment, coding for a novel 
binding domain, could be fused in- frame to a gene fragment 
coding for other proteins, such as serum albumin or the 
constant parts of IgG. 

Tschesche et al . (TSCH87) reported on the binding of 
several BPTI derivatives to various proteases: 

Dissociation constants for BPTI derivatives, Molar. 



Residue 
#15 

lysine 

glycine 

alanine 

valine 

leucine 



Trypsin Chymotrypsin Elastase Elastase 
(bovine (bovine (porcine (human 

pancreas) pancreas) pancreas) leukocytes) 



6 . 0 • 10 



-14 



9.0-10" 



2.8-10" 



5.7-10" 



1.9-10" 



3.5-10 
7 . 0 • 10 



2 . 5 • 10 



-9 



-9 



1 . 1 • 10 
2 . 9 - 10" 



-10 



From the report of Tschesche et al . we infer that molecular 
pairs marked " + " have K^s a 3.5-10" 6 M and that molecular pairs 
marked " - 11 have KdS >> 3.5-10" 6 M. Because of the wealth of 
data about the binding of BPTI and various mutants to trypsin 
and other proteases (TSCH87) , we can proceed in various ways 
in optimizing the affinity separation conditions. (For other 
PBDs, we can obtain two different monoclonal antibodies, one 
with a high affinity having Ka of order 10" 11 M, and one with a 
moderate affinity having Kd on the order of 10" 6 M.) 
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Works concerning BPTI and its homologues include: 
KID08 8, PONT8 8, KIDO90, AUER87, AUER90, SCOT87b, AUER88 , 
AUER89, BECK8 8b, WACH7 9 , WACH8 0, BECK8 9a, DUFT8 5 , FIOR88, 
GIRA89, GOLD84, GOLD88, HOCH84 , RIT083 , NORR8 9a, NORR8 9b, 
5 OLTE89, SWAI88, and WAGN7 9. 
II. F Mini -Proteins as IPBDs: 

A polypeptide is a polymer composed of a single chain of 
the same or different amino acids joined by peptide bonds. 
Linear peptides can take up a very large number of different 

10 conformations through internal rotations about the main chain 
single bonds of each or carbon. These rotations are hindered 
to varying degrees by side groups, with glycine interfering 
the least, and valine, isoleucine and, especially, proline, 
the most. A polypeptide of 20 residues may have I0 20 different 

15 conformations which it may assume by various internal 
rotations . 

Proteins are polypeptides which, as a result of 
stabilizing interactions between amino acids that are not in 
adjacent positions in the chain, have folded into a well- 

20 defined conformation. This folding is usually essential to 
their biological activity. 

For polypeptides of 40-60 residues or longer, noncovalent 
forces such as hydrogen bonds, salt bridges, and hydrophobic 
"interactions" are sufficient to stabilize a particular 

25 folding or conformation. The polypeptide's constituent 

segments are held to more or less that conformation unless it 
is perturbed by a denaturant such as rising temperature or 
decreasing pH, whereupon the polypeptide unfolds or "melts". 
The smaller the peptide, the more likely it is that its 

30 conformation will be determined by the environment. If a 
small unconstrained peptide has biological activity, the 
peptide ligand will be in essence a random coil until it comes 
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into proximity with its receptor. The receptor accepts the 
peptide only in one or a few conformations because alternative 
conformations are disfavored by unfavorable van der Waals and 
other non-covalent interactions. 
5 Small polypeptides have potential advantages over larger 

polypeptides when used as therapeutic or diagnostic agents, 
including (but not limited to) : 

a) better penetration into tissues, 

b) faster elimination from the circulation (important for 
10 imaging agents) , 

c) lower antigenicity, and 

d) higher activity per mass. 

Moreover, polypeptides of under about 50 residues have 
the advantage of accessibility via chemical synthesis; 

15 polypeptides of under about 30 residues are more easily 

synthesized than are larger polypeptides. Thus, it would be 
desirable to be able to employ the combination of variegation 
and affinity selection to identify small polypeptides which 
bind a target of choice. 

20 Polypeptides of this size, however, have disadvantages as 

binding molecules. According to Olivera et al . (OLIV90a) : 
"Peptides in this size range normally equilibrate among many 
conformations (in order to have a fixed conformation, proteins 
generally have to be much larger) . " Specific binding of a 

25 peptide to a target molecule requires the peptide to take up 
one conformation that is complementary to the binding site. 
For a decapeptide with three isoenergetic conformations ( e.g. , 
& strand, a helix, and reverse turn) at each residue, there 
are about 6.-10 4 possible overall conformations. Assuming 

3 0 these conformations to be equi -probable for the unconstrained 
decapeptide, if only one of the possible conformations bound 
to the binding site, then the affinity of the peptide for the 
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target is expected to be about 6-10 4 higher if it could be 
constrained to that single effective conformation. Thus, the 
unconstrained decapeptide, relative to a decapeptide 
constrained to the correct conformation, would be expected to 
5 exhibit lower affinity. It would also exhibit lower 

specificity, since one of the other conformations of the 
unconstrained decapeptide might be one which bound tightly to 
a material other than the intended target . By way of 
corollary, it could have less resistance to degradation by 

10 proteases, since it would be more likely to provide a binding 
site for the protease. 

In one embodiment, the present invention overcomes these 
problems, while retaining the advantages of smaller 
polypeptides, by fostering the biosynthesis of novel mini- 

15 proteins having the desired binding characteristics. Mini- 
Proteins are small polypeptides (usually less than about 60 
residues) which, while too small to have a stable conformation 
as a result of noncovalent forces alone, are covalently 
crosslinked ( e.g. , by disulfide bonds) into a stable 

20 conformation and hence have biological activities more typical 
of larger protein molecules than of unconstrained polypeptides 
of comparable size. 

When mini -proteins are variegated, the residues which are 
covalently crosslinked in the parental molecule are left 

25 unchanged, thereby stabilizing the conformation. For example, 
in the variegation of a disulfide bonded mini -protein, certain 
cysteines are invariant so that under the conditions of 
expression and display, covalent crosslinks ( e.g. , disulfide 
bonds between one or more pairs of cysteines) form, and 

3 0 substantially constrain the conformation which may be adopted 
by the hypervariable linearly intermediate amino acids. In 



other words, a constraining scaffolding is engineered into 
polypeptides which are otherwise extensively randomized. 

Once a mini -protein of desired binding characteristics is 
characterized, it may be produced, not only by recombinant DNA 
techniques, but also by nonbiological synthetic methods. 

In vitro, disulfide bridges can form spontaneously in 
polypeptides as a result of air oxidation. Matters are more 
complicated in vivo . Very few intracellular proteins have 
disulfide bridges, probably because a strong reducing 
environment is maintained by the glutathione system. 
Disulfide bridges are common in proteins that travel or 
operate in extracellular spaces, such as snake venoms and 
other toxins ( e.g. , conotoxins, charybdotoxin, bacterial 
enterotoxins) , peptide hormones, digestive enzymes, complement 
proteins, immunoglobulins, lysozymes, protease inhibitors 
(BPTI and its homologues, CMTI-III ( Cucurbita maxima trypsin 
inhibitor III) and its homologues, hirudin, etc . ) and milk 
proteins . 

Disulfide bonds that close tight intrachain loops have 
been found in pepsin, thioredoxin, insulin A-chain, silk 
fibroin, and lipoamide dehydrogenase. The bridged cysteine 
residues are separated by one to four residues along the 
polypeptide chain. Model building, X-ray diffraction 
analysis, and NMR studies have shown that the a carbon path of 
such loops is usually flat and rigid. 

There are two types of disulfide bridges in 
immunoglobulins. One is the conserved intrachain bridge, 
spanning about 60 to 70 amino acid residues and found, 
repeatedly, in almost every immunoglobulin domain. Buried 
deep between the opposing 6 sheets, these bridges are shielded 
from solvent and ordinarily can be reduced only in the 
presence of denaturing agents. The remaining disulfide 
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bridges are mainly interchain bonds and are located on the 
surface of the molecule; they are accessible to solvent and 
relatively easily reduced (STEI85) . The disulfide bridges of 
the mini -proteins of the present invention are intrachain 
5 linkages between cysteines having much smaller chain spacings. 
For the purpose of the appended claims, a mini -protein 
has between about eight and about sixty residues. However, it 
will be understood that a chimeric surface protein presenting 
a mini-protein as a domain will normally have more than sixty 

10 residues. Polypeptides containing intrachain disulfide bonds 
may be characterized as cyclic in nature, since a closed 
circle of covalently bonded atoms is defined by the two 
cysteines, the intermediate amino acid residues, their 
peptidyl bonds, and the disulfide bond. The terms "cycle", 

15 "span" and "segment" will be used to define certain structural 
features of the polypeptides. An intrachain disulfide bridge 
connecting amino acids 3 and 8 of a 16 residue polypeptide 
will be said herein to have a cycle of 6 and a span of 4 . If 
amino acids 4 and 12 are also disulfide bonded, then they form 

20 a second cycle of 9 with a span of 7. Together, the four 
cysteines divide the polypeptide into four inter cysteine 
segments (1-2, 5-7, 9-11, and 13-16) . (Note that there is no 
segment between Cys3 and Cys4 . ) 

The connectivity pattern of a crosslinked mini-protein is 

25 a simple description of the relative location of the termini 
of the crosslinks. For example, for a mini -protein with two 
disulfide bonds, the connectivity pattern "1-3, 2-4" means 
that the first crosslinked cysteine is disulfide bonded to the 
third crosslinked cysteine (in the primary sequence) , and the 

3 0 second to the fourth. 

The degree to which the crosslink constrains the 
conformational freedom of the mini-protein, and the degree to 



55 

which it stabilizes the mini -protein, may be assessed by a 
number of means. These include absorption spectroscopy (which 
can reveal whether an amino acid is buried or exposed) , 
circular dichroism studies (which provides a general picture 
5 of the helical content of the protein) , nuclear magnetic 
resonance imaging (which reveals the number of nuclei in a 
particular chemical environment as well as the mobility of 
nuclei) , and X-ray or neutron diffraction analysis of protein 
crystals. The stability of the mini-protein may be 

10 ascertained by monitoring the changes in absorption at various 
wavelengths as a function of temperature, pH, etc . ; buried 
residues become exposed as the protein unfolds. Similarly, 
the unfolding of the mini -protein as a result of denaturing 
conditions results in changes in NMR line positions and 

15 widths. Circular dichroism (CD) spectra are extremely 
sensitive to conformation. 

The variegated disulf ide-bonded mini-proteins of the 
present invention fall into several classes. 

Class I mini-proteins are those featuring a single pair 

20 of cysteines capable of interacting to form a disulfide bond, 
said bond having a span of no more than nine residues. This 
disulfide bridge preferably has a span of at least two 
residues; this is a function of the geometry of the disulfide 
bond. When the spacing is two or three residues, one residue 

25 is preferably glycine in order to reduce the strain on the 

bridged residues. The upper limit on spacing is less precise, 
however, in general, the greater the spacing, the less the 
constraint on conformation imposed on the linearly 
intermediate amino acid residues by the disulfide bond. 

30 The main chain of such a peptide has very little freedom, 

but is not stressed. The free energy released when the 
disulfide forms exceeds the free energy lost by the main- chain 
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when locked into a conformation that brings the cysteines 
together. Having lost the free energy of disulfide formation, 
the proximal ends of the side groups are held in more or less 
fixed relation to each other. When binding to a target, the 
5 domain does not need to expend free energy getting into the 

correct conformation. The domain can not jump into some other 
conformation and bind a non-target. 

A disulfide bridge with a span of 4 or 5 is especially 
preferred. If the span is increased to 6, the constraining 

10 influence is reduced. In this case, we prefer that at least 
one of the enclosed residues be an amino acid that imposes 
restrictions on the main-chain geometry. Proline imposes the 
most restriction. Valine and isoleucine restrict the main 
chain to a lesser extent. The preferred position for this 

15 constraining non-cysteine residue is adjacent to one of the 
invariant cysteines, however, it may be one of the other 
bridged residues. If the span is seven, we prefer to include 
two amino acids that limit main-chain conformation. These 
amino acids could be at any of the seven positions, but are 

2 0 preferably the two bridged residues that are immediately 

adjacent to the cysteines. If the span is eight or nine, 
additional constraining amino acids may be provided. 

The disulfide bond of a class I mini -proteins is exposed 
to solvent. Thus, one should avoid exposing the variegated 
25 population of GPs that display class I mini-proteins to 

reagents that rupture disulfides; Creighton names several such 
reagents (CREI88) . 

Class II mini-proteins are those featuring a single 
disulfide bond having a span of greater than nine amino acids. 

3 0 The bridged amino acids form secondary structures which help 

to stabilize their conformation. Preferably, these 
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intermediate amino acids form hairpin supersecondary 
structures such as those schematized below: 



- Cy s - ahe 1 ix - turn - Ss t rand - Cy s - 

I s— s 1 

-Cys-ahelix-turn-ahelix-Cys- 

i s — s 1 

- Cy s - Ss t rand - turn - JSs t rand - Cy s - 
Secondary structures are stabilized by hydrogen bonds between 
5 amide nitrogen and carbonyl groups, by interac tions between 
charged side groups and helix dipoles, and by van der Waals 
contacts. One abundant secondary structure in proteins is the 
a-helix. The a helix has 3.6 residues per turn, a 1.5 A rise 
per residue, and a helical radius of 2.3 A. All observed a- 

10 helices are right-handed. The torsion angles <f> (-57°) and \f/ 
(- 47°) are favorable for most residues, and the hydrogen bond 
between the backbone carbonyl oxygen of each residue and the 
backbone NH of the fourth residue along the chain is 2.86 A 
long (nearly the optimal distance) and virtually straight. 

15 Since the hydrogen bonds all point in the same direction, the 
of helix has a considerable dipole moment (carboxy terminus 
negative) . 

The £ strand may be considered an elongated helix with 
2.3 residues per turn, a translation of 3.3 A per residue, and 

20 a helical radius of 1.0 A. Alone, a £ strand forms no main- 
chain hydrogen bonds. Most commonly, £ strands are found in 
twisted (rather than planar) parallel, antiparallel , or mixed 
parallel/ant iparallel sheets . 

A peptide chain can form a sharp reverse turn. A reverse 

25 turn may be accomplished with as few as four amino acids. 

Reverse turns are very abundant, comprising a quarter of all 
residues in globular proteins. In proteins, reverse turns 
commonly connect £ strands to form £ sheets, but may also form 
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other connections. A peptide can also form other turns that 
are less sharp. 

Based on studies of known proteins, one may calculate the 
propensity of a particular residue, or of a particular 
5 dipeptide or tripeptide, to be found in an of helix, £ strand 
or reverse turn. The normalized frequencies of occurrence of 
the amino acid residues in these secondary structures is given 
in Table 6-4 of CREI84. For a more detailed treatment on the 
prediction of secondary structure from the amino acid 

10 sequence, see Chapter 6 of SCHU79. 

In designing a suitable hairpin structure, one may copy 
an actual structure from a protein whose three-dimensional 
conformation is known, design the structure using frequency 
data, or combine the two approaches. Preferably, one or more 

15 actual structures are used as a model, and the frequency data 
is used to determine which mutations can be made without 
disrupting the structure. 

Preferably, no more than three amino acids lie between 
the cysteine and the beginning or end of the a helix or £ 

2 0 strand. 

More complex structures (such as a double hairpin) are 
also possible. 

Class III mini-proteins are those featuring a plurality 
of disulfide bonds. They optionally may also feature 

2 5 secondary structures such as those discussed above with regard 
to Class II mini-proteins. Since the number of possible 
disulfide bond topologies increases rapidly with the number of 
bonds (two bonds, three topologies; three bonds, 15 
topologies; four bonds, 105 topologies) the number of 

30 disulfide bonds preferably does not exceed four. With two or 
more disulfide bonds, the disulfide bridge spans preferably do 
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not exceed 50, and the largest intercysteine chain segment 
preferably does not exceed 20. 

Naturally occurring class III mini -proteins, such as 
heat -stable enterotoxin ST- la frequently have pairs of 
5 cysteines that are adj acent in the amino-acid sequence . 

Adjacent cysteines are very unlikely to form an intramolecular 
disulfide and cysteines separated by a single amino acids form 
an intramolecular disulfide with difficulty and only for 
certain intervening amino acids . Thus , clustering cysteines 

10 within the amino-acid sequence reduces the number of 

realizable disulfide bonding schemes . We utilize such 
clustering in the class III mini-protein disclosed herein. 

Metal Finger Mini - Proteins . The mini -proteins of the 
present invention are not limited to those crosslinked by 

15 disulfide bonds. Another important class of mini -proteins are 
analogues of finger proteins. Finger proteins are 
characterized by finger structures in which a metal ion is 
coordinated by two Cys and two His residues, forming a 
tetrahedral arrangement around it. The metal ion is most 

20 often zinc (II), but may be iron, copper, cobalt, etc . The 

"finger" has the consensus sequence (Phe or Tyr) - (1 AA) -Cys- 
(2-4 AAs) -Cys- (3 AAs) -Phe- (5 AAs) -Leu- (2 AAs) -His- (3 AAs) -His- 
(5 AAs) (SEQ ID NOs : 1 , 2 , 3 , 4 , 5 , 6 ) (BERG88; GIBS88) . While finger 
proteins typically contain many repeats of the finger motif, 

25 it is known that a single finger will fold in the presence of 
zinc ions (FRAN87; PARR88) . There is some dispute as to 
whether two fingers are necessary for binding to DNA. The 
present invention encompasses mini -proteins with either one or 
two fingers. It is to be understood that the target need not 

30 be a nucleic acid. 
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G. Modified PBSs 

There exist a number of enzymes and chemical reagents 
that can selectively modify certain side groups of proteins, 
including: a) protein- tyrosine kinase, Ellmans reagent, 
5 methyl transferases (that methylate GLU side groups) , serine 
kinases, proline hydroxyases, vitamin- K dependent enzymes that 
convert GLU to GLA, maleic anhydride, and alkylating agents. 
Treatment of the variegated population of GP(PBD)s with one of 
these enzymes or reagents will modify the side groups affected 

10 by the chosen enzyme or reagent. Enzymes and reagents that do 
not kill the GP are much preferred. Such modification of side 
groups can directly affect the binding properties of the 
displayed PBDs . Using affinity separation methods, we enrich 
for the modified GPs that bind the predetermined target. 

15 Since the active binding domain is not entirely geneti cally 
specified, we must repeat the post -morphogenesis modification 
at each enrichment round. This approach is particularly 
appropriate with mini -protein IPBDs because we envision 
chemical synthesis of these SBDs . 

2 0 III. VARIEGATION STRATEGY MUTAGENESIS TO OBTAIN POTENTIAL 

BINDING DOMAINS WITH DESIRED DIVERSITY 

III .A. Generally 

Using standard genetic engineering techniques, a molecule 
of variegated DNA can be introduced into a vector so that it 
25 constitutes part of a gene (OLIP86, OLIP87, AUSU87 , REID88a) . 
When vector containing variegated DNA are used to transform 
bacteria, each cell makes a version of the original protein. 
Each colony of bacteria may produce a different version from 
any other colony. If the variegations of the DNA are 

3 0 concentrated at loci known to be on the surface of the protein 

or in a loop, a population of proteins will be generated, 
many members of which will fold into roughly the same 3D 
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structure as the parent protein. The specific binding 
properties of each member, however, may be different from each 
other member. 

We now consider the manner in which we generate a diverse 
5 population of potential binding domains in order to facilitate 
selection of a PBD-bearing GP which binds with the requisite 
affinity to the target of choice. The potential binding 
domains are first designed at the amino acid level. Once we 
have identified which residues are to be mutagenized, and 

10 which mutations to allow at those positions, we may then 

design the variegated DNA which is to encode the various PBDs 
so as to assure that there is a reasonable probability that if 
a PBD has an affinity for the target, it will be detected. Of 
course, the number of independent transf ormants obtained and 

15 the sensitivity of the affinity separation technology will 

impose limits on the extent of variegation possible within any 
single round of variegation. 

There are many ways to generate diversity in a protein. 
(See RICH86, CARU85, and OLIP86.) At one extreme, we vary a 

2 0 few residues of the protein as much as possible ( inter alia 
see CARU85, CARU87, RICH8 6, and WHAR86) . We will call this 
approach "Focused Mutagenesis" . A typical "Focused 
Mutagenesis" strategy is to pick a set of five to seven 
residues and vary each through 13-20 possibilities. An 

25 alternative plan of mutagenesis ("Diffuse Mutagenesis") is to 
vary many more residues through a more limited set of choices 
(See VERS86a and PAKU86) . The variegation pattern adopted may 
fall between these extremes, e.g. , two residues varied through 
all twenty amino acids, two more through only two 

30 possibilities, and a fifth into ten of the twenty amino acids . 

There is no fixed limit on the number of codons which can 
be mutated simultaneously. However, it is desirable to adopt 
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a mutagenesis strategy which results in a reasonable 
probability that a possible PBD sequence is in fact displayed 
by at least one genetic package. When the size of the set of 
amino acids potentially encoded by each variable codon is the 
5 same for all variable codons and within the set all amino 

acids are equiprobable, this probability may be calculated as 
follows: Let r(k,q) be the probability that amino acid number 
k will occur at variegated codon q; these codons need not be 
contiguous. The probability that a particular vgDNA molecule 
10 will encode a PBD containing n variegated amino acids k^ . . . , 
k n is : 

p(k X/ k n ) =r(k 1/ l)- ... -r(k n/ n) 

Consider a library of Ni t independent transf ormants prepared 
with said vgDNA ; the probability that the sequence ki, ... , k n 

15 is absent is: 

P (missing k 1# . .., k n ) = exp{ -N it -p (ki, k n ) } . 

P(ki, . .., k n in lib) = 1 - exp{ -N it -p (k X/ . .., k n ) } . 
Preferably, the probability that a mutein encoded by the vgDNA 
and composed of the least favored amino acids at each 

2 0 variegated position will be displayed by at least one 

independent transf ormant in the library is at least 0.50, and 
more preferably at least 0.90. (Muteins composed of more 
favored amino acids would of course be more likely to occur in 
the same library.) 

25 Preferably, the variegation is such as will cause a 

typical transformant population to display 10 6 -10 7 different 
amino acid sequences by means of preferably not more than 10- 
fold more (more preferably not more than 3 -fold) different DNA 
sequences . 

30 For a mini-protein that lacks a helices and 6 strands, 

one will, in any given round of mutation, preferably variegate 
each of 4-6 non- cysteine codons so that they each encode at 
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least eight of the 20 possible amino acids. The variegation 
at each codon could be customized to that position. 
Preferably, cysteine is not one of the potential 
substitutions, though it is not excluded. 
5 When the mini-protein is a metal finger protein, in a 

typical variegation strategy, the two Cys and two His 
residues, and optionally also the aforementioned Phe/Tyr, Phe 
and Leu residues, are held invariant and a plurality (usually 
5-10) of the other residues are varied. 

10 When the mini-protein is of the type featuring one or 

more a helices and £ strands, the set of potential amino acid 
modifications at any given position is picked to favor those 
which are less likely to disrupt the secondary structure at 
that position. Since the number of possibilities at each 

15 variable amino acid is more limited, the total number of 
variable amino acids may be greater without altering the 
sampling efficiency of the selection process. 

For the last -mentioned class of mini-proteins, as well as 
domains other than mini -proteins , preferably not more than 2 0 

2 0 and more preferably 5-10 codons will be variegated. However, 
if diffuse mutagenesis is employed, the number of codons which 
are variegated can be higher. 

The decision as to which residues to modify is eased by 
knowledge of which residues lie on the surface of the domain 

2 5 and which are buried in the interior. 

We choose residues in the IPBD to vary through 
consideration of several factors, including: a) the 3D 
structure of the IPBD, b) sequences homologous to IPBD, and c) 
modeling of the IPBD and mutants of the IPBD. When the number 

3 0 of residues that could strongly influence binding is greater 

than the number that should be varied simultaneously, the user 
should pick a subset of those residues to vary at one time. 
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The user picks trial levels of variegation and calculate the 
abundances of various sequences. The list of varied residues 
and the level of variegation at each varied residue are 
adjusted until the composite variegation is commensurate with 
5 the sensitivity of the affinity separation and the number of 
independent transf ormants that can be made. 

Preferably, the abundance of PPBD-encoding DNA is 3 to 10 
times higher than both 1/M ntv and 1/C sen si to provide a margin of 
redundancy. M n tv is the number of transf ormants that can be 

10 made from Y D i 00 DNA. With current technology Mntv is 

approximately 5-10 8 , but the exact value depends on the details 
of the procedures adapted by the user. Improvements in 
technology that allow more efficient: a) synthesis of DNA, b) 
ligation of DNA, or c) transformation of cells will raise the 

15 value of M ntv . C se nsi is the sensitivity of the affinity 

separation; improvements in affinity separation will raise 
Csensi- If the smaller of M nt v and Csensi is increased, higher 
levels of variegation may be used. For example, if C se nsi is 1 
in 10 9 and M nt v is 10 8 , then improvements in C sen si are less 

2 0 valuable than improvements in M ntv - 

While variegation normally will involve the substitution 
of one amino acid for another at a designated variable codon, 
it may involve the insertion or deletion of amino acids as 
well . 

2 5 III . B . Identification of Residues to be Varied 

We now consider the principles that guide our choice of 
residues of the IPBD to vary. A key concept is that only 
structured proteins exhibit specific binding, i.e. can bind to 
a particular chemical entity to the exclusion of most others. 

3 0 Thus the residues to be varied are chosen with an eye to 

preserving the underlying IPBD structure. Substitutions that 
prevent the PBD from folding will cause GPs carrying those 
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genes to bind indiscriminately so that they can easily be 
removed from the population. 

Sauer and colleagues (PAKU86, REID88a) , and Caruthers and 
colleagues (EISE85) have shown that some residues on the 
polypeptide chain are more important than others in 
determining the 3D structure of a protein. The 3D structure 
is essentially unaffected by the identity of the amino acids 
at some loci; at other loci only one or a few types of amino 
acid is allowed. In most cases, loci where wide variety is 
allowed have the amino acid side group directed toward the 
solvent. Loci where limited variety is allowed frequently 
have the side group directed toward other parts of the 
protein. Thus substitutions of amino acids that are exposed 
to solvent are less likely to affect the 3D structure than are 
substitutions at internal loci. (See also SCHU79, pl69-171 
and CREI84, p239-245, 314- 315) . 

The residues that join helices to helices, helices to 
sheets, and sheets to sheets are called turns and loops and 
have been classified by Richardson (RICH81) , Thornton 
(THOR88) , Sutcliffe et al . (SUTC87a) and others. Insertions 
and deletions are more readily tolerated in loops than 
elsewhere. Thornton et al . (THOR88) have summarized many 
observations indicating that related proteins usually differ 
most at the loops which join the more regular elements of 
secondary structure . (These observations are relevant not 
only to the variegation of potential binding domains but also 
to the insertion of binding domains into an outer surface 
protein of a genetic package, as discussed in a later 
section . ) 

Burial of hydrophobic surfaces so that bulk water is 
excluded is one of the strongest forces driving the binding of 
proteins to other molecules. Bulk water can be excluded from 



the region between two molecules only if the surfaces are 
complementary. We should test as many surface variations as 
possible to find one that is complementary to the target. The 
selection-through-binding isolates those proteins that are 
5 more nearly complementary to some surface on the target . 
Proteins do not have distinct, countable faces. 
Therefore we define an 11 interaction set" to be a set of 
residues such that all members of the set can simultaneously 
touch one molecule of the target material without any atom of 

10 the target coming closer than van der Waals distance to any 
main-chain atom of the IPBD. The concept of a residue 
"touching" a molecule of the target is discussed below. From 
a picture of BPTI (such as Figure 6-10, p. 225 of CREI84) we 
can see that residues 3, 7, 8, 10, 13, 39, 41, and 42 can all 

15 simultaneously contact a molecule the size and shape of 

myoglobin. We also see that residue 4 9 can not touch a single 
myoglobin molecule simultaneously with any of the first set 
even though all are on the surface of BPTI. (It is not the 
intent of the present invention, however, to suggest that use 

20 of models is required to determine which part of the target 
molecule will actually be the site of binding by PBD.) 

Variations in the position, orientation and nature of the 
side chains of the residues of the interaction set will alter 
the shape of the potential binding surface defined by that 

25 set. Any individual combination of such variations may result 
in a surface shape which is a better or a worse fit for the 
target surface. The effective diversity of a variegated 
population is measured by the number of distinct shapes the 
potentially complementary surfaces of the PBD can adopt, 

3 0 rather than the number of protein sequences. Thus, it is 

preferable to maximize the former number, when our knowledge 
of the IPBD permits us to do so. 
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To maximize the number of surface shapes generated for 
when N residues are varied, all residues varied in a given 
round of variegation should be in the same interaction set 
because variation of several residues in one interaction set 
5 generates an exponential number of different shapes of the 
potential binding surface. 

If cassette mutagenesis is to be used to introduce the 
variegated DNA into the ipbd gene, the protein residues to be 
varied are, preferably, close enough together in sequence that 

10 the variegated DNA (vgDNA) encoding all of them can be made in 
one piece. The present invention is not limited to a 
particular length of vgDNA that can be synthesized. With 
current technology, a stretch of 6 0 amino acids (180 DNA 
bases) can be spanned. 

15 Further, when there is reason to mutate residues further 

than sixty residues apart, one can use other mutational means, 
such as single -stranded -oligonucleotide -directed mutagenesis 
(BOTS85) using two or more mutating primers. 

Alternatively, to vary residues separated by more than 

2 0 sixty residues, two cassettes may be mutated as follows: 1) 
vg DNA having a low level of variegation (for example, 20 to 
400 fold variegation) is introduced into one cassette in the 
OCV, 2) cells are transformed and cultured, 3) vg OCV DNA is 
obtained, 4) a second segment of vgDNA is inserted into a 

2 5 second cassette in the OCV, and5) cells are transformed and 

cultured, GPs are harvested and subjected to selection- 
through- binding. 

The composite level of variation preferably does not 
exceed the prevailing capabilities to a) produce very large 

3 0 numbers of independently transformed cells or b) detect small 

components in a highly varied population. The limits on the 
level of variegation are discussed later. 
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Data about the IPBD and the target that are useful in 
deciding which residues to vary in the variegation cycle 
include: 1) 3D structure, or at least a list of residues on 
the surface of the IPBD, 2) list of sequences homologous to 
5 IPBD, and 3) model of the target molecule or a stand-in for 
the target . 

These data and an understanding of the behavior of 
different amino acids in proteins will be used to answer two 
questions : 

10 1) which residues of the IPBD are on the outside and close 

enough together in space to touch the target 
simultaneously? 
2) which residues of the IPBD can be varied with high 

probability of retaining the underlying IPBD structure? 

15 Although an atomic model of the target material (obtained 

through X-ray crystallography, NMR, or other means) is 
preferred in such examination, it is not necessary. For 
example, if the target were a protein of unknown 3D structure, 
it would be sufficient to know the molecular weight of the 

20 protein and whether it were a soluble globular protein, a 
fibrous protein, or a membrane protein. Physical 
measurements, such as low-angle neutron diffraction, can 
determine the overall molecular shape, viz . the ratios of the 
principal moments of inertia. One can then choose a protein 

25 of known structure of the same class and similar size and 

shape to use as a molecular stand-in and yardstick. It is not 
essential to measure the moments of inertia of the target 
because, at low resolution, all proteins of a given size and 
class look much the same. The specific volumes are the same, 

3 0 all are more or less spherical and therefore all proteins of 
the same size and class have about the same radius of 
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curvature. The radii of curvature of the two molecules 
determine how much of the two molecules can come into contact . 

The most appropriate method of picking the residues of 
the protein chain at which the amino acids should be varied is 
5 by viewing, with interactive computer graphics, a model of the 
IPBD. A stick- figure representation of molecules is 
preferred. A suitable set of hardware is an Evans & 
Sutherland PS3 90 graphics terminal (Evans & Sutherland 
Corporation, Salt Lake City, UT) and a MicroVAX II supermicro 

10 computer (Digital Equipment Corp., Maynard, MA). The computer 
should, preferably, have at least 150 megabytes of disk 
storage, so that the Brookhaven Protein Data Bank can be kept 
on line. A FORTRAN compiler, or some equally good higher- 
level language processor is preferred for program development. 

15 Suitable programs for viewing and manipulating protein models 
include: a) PS-FRODO, written by T . A. Jones (JONE85) and 
distributed by the Biochemistry Department of Rice University, 
Houston, TX; and b) PROTEUS, developed by Dayringer, 
Tramantano, and Fletterick (DAYR8 6) . Important features of 

2 0 PS- FRODO and PROTEUS that are needed to view and manipulate 
protein models for the purposes of the present invention are 
the abilities to: 1) display molecular stick figures of 
proteins and other molecules, 2) zoom and clip images in real 
time, 3) prepare various abstract representations of the 

2 5 molecules, such as a line joining C a s and side group atoms, 4) 

compute and display solvent -accessible surfaces reasonably 
quickly, 5) point to and identify atoms, and 6) measure 
distance between atoms. 

In addition, one could use theoretical calculations, such 

3 0 as dynamic simulations of proteins, to estimate whether a 

substitution at a particular residue of a particular amino- 
acid type might produce a protein of approximately the same 3D 



structure as the parent protein. Such calculations might also 
indicate whether a particular substitution will greatly affect 
the flexibility of the protein; calculations of this sort may 
be useful but are not required. 

Residues whose mutagenesis is most likely to affect 
binding to a target molecule, without destabilizing the 
protein, are called the "principal set". Using the knowledge 
of which residues are on the surface of the IPBD (as noted 
above) , we pick residues that are close enough together on the 
surface of the IPBD to touch a molecule of the target 
simultaneously without having any IPBD main-chain atom come 
closer than van der Waals distance ( viz. 4.0 to 5.0 A) from 
any target atom. For the purposes of the present invention, a 
residue of the IPBD "touches" the target if: a) a main-chain 
atom is within van der Waals distance, viz . 4.0 to 5.0 A of 
any atom of the target molecule, or b) the C 6 is within D cut0 ff 
of any atom of the target molecule so that a side -group atom 
could make contact with that atom. 

Because side groups differ in size ( cf . Table 35) , some 
judgment is required in picking D cutof f. In the preferred 
embodiment, we will use D cut0 ff = 8.0 A, but other values in the 
range 6.0 A to 10.0 A could be used. If IPBD has G at a 
residue, we construct a pseudo C 6 with the correct bond 
distance and angles and judge the ability of the residue to 
touch the target from this pseudo C fi . 

Alternatively, we choose a set of residues on the surface 
of the IPBD such that the curvature of the surface defined by 
the residues in the set is not so great that it would prevent 
contact between all residues in the set and a molecule of the 
target. This method is appropriate if the target is a 
macromolecule, such as a protein, because the PBDs derived 
from the IPBD will contact only a part of the macromolecular 
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surface. The surfaces of macromolecules are irregular with 
varying curvatures. If we pick residues that define a surface 
that is not too convex, then there will be a region on a 
macromolecular target with a compatible curvature. 
5 In addition to the geometrical criteria, we prefer that 

there be some indication that the underlying IPBD structure 
will tolerate substitutions at each residue in the principal 
set of residues. Indications could come from various sources, 
including: a) homologous sequences, b) static computer 

10 modeling, or c) dynamic computer simulations. 

The residues in the principal set need not be contiguous 
in the protein sequence and usually are not. The exposed 
surfaces of the residues to be varied do not need to be 
connected. We desire only that the amino acids in the 

15 residues to be varied all be capable of touching a molecule of 
the target material simultaneously without having atoms 
overlap. If the target were, for example, horse heart 
myoglobin, and if the IPBD were BPTI, any set of residues in 
one interaction set of BPTI defined in Table 34 could be 

2 0 picked. 

The secondary set comprises those residues not in the 
primary set that touch residues in the primary set. These 
residues might be excluded from the primary set because: a) 
the residue is internal, b) the residue is highly conserved, 
2 5 or c) the residue is on the surface, but the curvature of the 
IPBD surface prevents the residue from being in contact with 
the target at the same time as one or more residues in the 
primary set . 

Internal residues are frequently conserved and the amino 
30 acid type can not be changed to a significantly different type 
without substantial risk that the protein structure will be 
disrupted. Nevertheless, some conservative changes of 
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internal residues, such as I to L or F to Y, are tolerated. 
Such conservative changes subtly affect the placement and 
dynamics of adjacent protein residues and such "fine tuning" 
may be useful once an SBD is found. 
5 Surface residues in the secondary set are most often 

located on the periphery of the principal set. Such 
peripheral residues can not make direct contact with the 
target simultaneously with all the other residues of the 
principal set. The charge on the amino acid in one of these 

10 residues could, however, have a strong effect on binding. 

Once an SBD is found, it is appropriate to vary the charge of 
some or all of these residues. For example, the variegated 
codon containing equimolar A and G at base 1, equimolar C and 
A at base 2, and A at base 3 yields amino acids T, A, K, and E 

15 with equal probability. 

The assignment of residues to the primary and secondary 
sets may be based on: a) geometry of the IPBD and the 
geometrical relationship between the IPBD and the target (or a 
stand-in for the target) in a hypothetical complex, and b) 

2 0 sequences of proteins homologous to the IPBD. However, it 

should be noted that the distinction between the principal set 
and the secondary set is one more of convenience than of 
substance; we could just as easily have assigned each amino 
acid residue in the domain a preference score that weighed 
25 together the different considerations affecting whether they 
are suitable for variegation, and then ranked the residues in 
order, from most preferred to least. 

For any given round of variegation, it may be necessary 
to limit the variegation to a subset of the residues in the 

3 0 primary and secondary sets, based on geometry and on the 

maximum allowed level of variegation that assures 
progressivity . The allowed level of variegation determines 
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how many residues can be varied at once; geometry determines 
which ones. 

The user may pick residues to vary in many ways. For 
example, pairs of residues are picked that are diametrically 
5 opposed across the face of the principal set . Two such pairs 
are used to delimit the surface, up/down and right/left. 
Alternatively, three residues that form an inscribed triangle, 
having as large an area as possible, on the surface are 
picked. One to three other residues are picked in a 

10 checkerboard fashion across the interaction surface. Choice 
of widely spaced residues to vary creates the possibility for 
high specificity because all the intervening residues must 
have acceptable complementarity before favorable interactions 
can occur at widely- separated residues. 

15 The number of residues picked is coupled to the range 

through which each can be varied by the restrictions discussed 
below. In the first round, we do not assume any binding 
between IPBD and the target and so progress ivity is not an 
issue. At the first round, the user may elect to produce a 

2 0 level of variegation such that each molecule of vgDNA is 
potentially different through, for example, unlimited 
variegation of 10 codons (20 10 approx. = 10 13 ) . One run of the 
DNA synthesizer produces approximately 10 13 molecules of length 
100 nts. Inefficiencies in ligation and transformation will 

25 reduce the number of proteins actually tested to between 10 7 

and 5-10 8 . Multiple replications of the process with such very 
high levels of variegation will not yield repeatable results ; 
the user decides whether this is important. 

Ill . C . Determining the Substitution Set for Each Parental 

30 Residue 

Having picked which residues to vary, we now decide the 
range of amino acids to allow at each variable residue. The 
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total level of variegation is the product of the number of 
variants at each varied residue. Each varied residue can have 
a different scheme of variegation, producing 2 to 20 different 
possibilities. The set of amino acids which are potentially 
5 encoded by a given variegated codon are called its 
"substitution set". 

The computer that controls a DNA synthesizer, such as the 
Milligen 7500, can be programmed to synthesize any base of an 
oligo-nt with any distribution of nts by taking some nt 

10 substrates ( e.g. nt phosphoramidites) from each of two or more 
reservoirs. Alternatively, nt substrates can be mixed in any 
ratios and placed in one of the extra reservoir for so called 
"dirty bottle" synthesis. Each codon could be programmed 
differently. The "mix" of bases at each nucleotide position 

15 of the codon determines the relative frequency of occurrence 
of the different amino acids encoded by that codon. 

Simply variegated codons are those in which those 
nucleotide positions which are degenerate are obtained from a 
mixture of two or more bases mixed in equimolar proportions. 

2 0 These mixtures are described in this specification by means of 
the standardized "ambiguous nucleotide" code (Table 1 and 37 
CFR §1.822). In this code, for example, in the degenerate 
codon " SNT " , "S" denotes an equimolar mixture of bases G and 
C, "N" , an equimolar mixture of all four bases, and "T" , the 

2 5 single invariant base thymidine. 

Complexly variegated codons are those in which at least 
one of the three positions is filled by a base from an other 
than equimolar mixture of two of more bases. 

Either simply or complexly variegated codons may be used 

30 to achieve the desired substitution set. 

If we have no information indicating that a particular 
amino acid or class of amino acid is appropriate, we strive to 
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substitute all amino acids with equal probability because 
representation of one mini -protein above the detectable level 
is wasteful. Equal amounts of all four nts at each position 
in a codon (NNN) yields the amino acid distribution in which 
5 each amino acid is present in proportion to the number of 
codons that code for it. This distribution has the 
disadvantage of giving two basic residues for every acidic 
residue. In addition, six times as much R, S, and L as W or M 
occur. If five codons are synthesized with this distribution, 

10 each of the 243 sequences encoding some combination of L, R, 
and S are 7776 -times more abundant than each of the 3 2 
sequences encoding some combination of W and M. To have five 
Ws present at detectable levels, we must have each of the 
(L,R,S) sequences present in 7776-fold excess. 

15 Preferably, we also consider the interactions between the 

sites of variegation and the surrounding DNA. If the method 
of mutagenesis to be used is replacement of a cassette, we 
consider whether the variegation will generate gratuitous 
restriction sites and whether they seriously interfere with 

2 0 the intended introduction of diversity. We reduce or 

eliminate gratuitous restriction sites by appropriate choice 
of variegation pattern and silent alteration of codons 
neighboring the sites of variegation. 

It is generally accepted that the sequence of amino acids 
25 in a protein or polypeptide determine the three-dimensional 
structure of the molecule, including the possibility of no 
definite structure. Among polypeptides of definite length and 
sequence, some have a defined tertiary structure and most do 
not . 

3 0 Particular amino acid residues can influence the tertiary 

structure of a defined polypeptide in several ways, including 
by: 
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a) affecting the flexibility of the polypeptide main chain, 

b) adding hydrophobic groups, 

c) adding charged groups, 

d) allowing hydrogen bonds, and 

5 e) forming cross-links, such as disulfides, chelation to 

metal ions, or bonding to prosthetic groups . 
Most works on proteins classify the twenty amino acids into 
categories such as hydrophobic/hydrophilic , 
positive/negative/neutral, or large/small. These 

10 classifications are useful rules of thumb, but one must be 
careful not to oversimplify. Proteins contain a variety of 
identifiable secondary structural features, including: a) or 
helices, b) 3-10 helices, c) anti-parallel fi sheets, d) 
parallel 6 sheets, e) Q loops, f) reverse turns, and g) 

15 various cross links. Many people have analyzed proteins of 

known structures and assigned each amino-acid to one category 
or another. Using the frequency at which particular amino 
acids occur in various types of secondary structures, people 
have a) tried to predict the secondary structures of proteins 

20 for which only the amino-acid sequence is known (CHOU74, 

CHOU78a, CHOU78b) , and b) designed proteins de novo that have 
a particular set of secondary structural elements (DEGR87, 
HECH90) . Although some amino acids show definite predilection 
for one secondary form ( e.g. VAL for E structure and ALA for a 

25 helices) , these preferences are not very strong; Creighton has 
tabulated the preferences (CREI84) . In only seven cases does 
the tendency exceed 2.0: 
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LEU Of/turn 2.2 

Every amino-acid type has been observed in every iden- tified 
secondary structural motif. ARG is particularly 
indiscriminate . 

PRO is generally taken to be a helix breaker. 
5 Nevertheless, proline often occurs at the beginning of helices 
or even in the middle of a helix, where it introduces a slight 
bend in the helix. Matthews and coworkers replaced a PRO that 
occurs near the middle of an a. helix in T4 lysozyme. To their 
surprise, the "improved" protein is less stable than the wild- 
10 type. The rest of the structure had been adapted to fit the 
bent helix. 

Lundeen (LUND86) has tabulated the frequencies of amino 
acids in helices, £ strands, turns, and coil in proteins of 
known 3D structure and has distinguished between CYSs having 

15 free thiol groups and half cystines. He reports that free CYS 
is found most often in helixes while half cystines are found 
more often in £ sheets. Half cystines are, however, regularly 
found in helices. Pease et al . ( PEAS 9 0 ) constructed a peptide 
having two cystines; one end of each is in a very stable a 

20 helix. Apamin has a similar structure (WEMM83, PEAS88) . 
Flexibility : 

GLY is the smallest amino acid, having two hydrogens 
attached to the C a . Because GLY has no Cg, it confers the most 
flexibility on the main chain. Thus GLY occurs very 

25 frequently in reverse turns, particularly in conjunction with 
PRO, ASP, ASN, SER, and THR. 

The amino acids ALA, SER, CYS, ASP, ASN, LEU, MET, PHE, 
TYR, TRP, ARG, HIS, GLU, GLN, and LYS have unbranched £ 
carbons. Of these, the side groups of SER, ASP, and ASN 

3 0 frequently make hydrogen bonds to the main chain and so can 
take on main-chain conformations that are energetically 
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unfavorable for the others. VAL, ILE, and THR have branched 6 
carbons which makes the extended main- chain conformation more 
favorable. Thus VAL and ILE are most often seen in 6 sheets. 
Because the side group of THR can easily form hydrogen bonds 
5 to the main chain, it has less tendency to exist in a S sheet. 

The main chain of proline is particularly constrained by 
the cyclic side group. The <f> angle is always close to -60°. 
Most prolines are found near the surface of the protein. 
Charge : 

10 LYS and ARG carry a single positive charge at any pH 

below 10.4 or 12.0, respectively. Nevertheless, the methylene 
groups, four and three respectively, of these amino acids are 
capable of hydrophobic interactions. The guanidinium group of 
ARG is capable of donating five hydrogens simultaneously, 

15 while the amino group of LYS can donate only three. 

Furthermore, the geometries of these groups is quite 
different, so that these groups are often not interchangeable. 

ASP and GLU carry a single negative charge at any pH 
above «4 . 5 and 4.6, respectively. Because ASP has but one 

2 0 methylene group, few hydrophobic interactions are possible. 

The geometry of ASP lends itself to forming hydrogen bonds to 
main-chain nitrogens which is consistent with ASP being found 
very often in reverse turns and at the beginning of helices. 
GLU is more often found in a helices and particularly in the 

25 amino-terminal portion of these helices because the negative 
charge of the side group has a stabilizing interaction with 
the helix dipole (NICH88, SALI88) . 

HIS has an ionization pK in the physiological range, viz . 
6.2. This pK can be altered by the proximity of charged 

30 groups or of hydrogen donators or acceptors. HIS is capable 

of forming bonds to metal ions such as zinc, copper, and iron. 
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Hydrogen bonds : 

Aside from the charged amino acids, SER, THR, ASN, GLN, 
TYR, and TRP can participate in hydrogen bonds. 
Cross links: 

5 The most important form of cross link is the disulfide 

bond formed between two thiols, especially the thiols of CYS 
residues. In a suitably oxidizing environment, these bonds 
form spontaneously. These bonds can greatly stabilize a 
particular conformation of a protein or mini -protein . When a 

10 mixture of oxidized and reduced thiol reagents are present, 
exchange reactions take place that allow the most stable 
conformation to predominate. Concerning disul fides in 
proteins and peptides, see also KATZ90, MATS 8 9 , PERR84, 
PERR86, SAUE8 6, WELL86, JANA8 9, HORV8 9, KISH85, and SCHN8 6. 

15 Other cross links that form without need of specific 

enzymes include: 

1) (CYS) 4 :Fe Rubredoxin(in CREI84,P.376 SEQ J.D ,,, ,^0 = 1222 

2) (CYS) 4 :Zn Aspartate Transcarbamylase (in 

CREI84, P. 376) and Zn-fingers 
20 (HARD90) (SEQ ID NO: 122) 

3) (HIS ) 2 (MET) (CYS) : Cu Azurin (in CREI84, P. 376) and Basic 

"Blue" Cu Cucumber protein (GUSS8 8) 
(SEQ ID NO: 123) 

4) (HIS) 4 :Cu CuZn superoxide dismutase (SEQ ID 
25 NO:124) 

5) (CYS) 4 : (Fe 4 S 4 ) Ferredoxin (in CREI84, P. 376) (SEQ ID 

NO: 122) 

6) (CYS) 2 (HIS) 2 : Zn Zinc-fingers (GIBS88) (SEQ ID NO:125) 

7) (CYS) 3 (HIS) :Zn Zinc-fingers (GAUS87, GIBS88) (SEQ ID 
3 0 NO: 12 6) 

Cross links having (HIS) 2 (MET) (CYS) : Cu ( SEQ ID N0^123l has the 

potential advantage that HIS and MET can not form other cross 
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links without Cu. 

Simply Variegated Codons 

The following simply variegated codons are useful because 
they encode a relatively balanced set of amino acids: 

1) SNT which encodes the set [L, P, H, R, V, A, D, G] : a) one 
acidic (D) and one basic (R) , b) both aliphatic (L,V) and 
aromatic hydrophobics (H) , c) large (L,R,H) and small 
(G,A) side groups, d) rigid (P) and flexible (G) amino 
acids, e) each amino acid encoded once. 

2) RNG which encodes the set [M, T, K, R, V, A, E, G] : a) one 
acidic and two basic (not optimal, but acceptable), b) 
hydrophilics and hydrophobics, c) each amino acid encoded 
once . 

3) RMG which encodes the set [T , K, A, E] : a) one acidic, one 
basic, one neutral hydrophilic, b) three favor a helices, 
c) each amino acid encoded once. 

4) VNT which encodes the set [L, P , H, R, I , T, N, S , V, A, D , G] : a) 
one acidic, one basic, b) all classes: charged, neutral 
hydrophilic, hydrophobic, rigid and flexible, etc., c) 
each amino acid encoded once. 

5) RRS which encodes the set [N, S , K, R, D, E , G 2 ] : a) two 
acidics, two basics, b) two neutral hydrophilics, c) only 
glycine encoded twice . 

6) NNT which encodes the set 

[F, S,Y / C / L,P / H / R / I,T,N,V,A,D,G] : a) sixteen DNA 
sequences provide fifteen different amino acids; only 
serine is repeated, all others are present in equal 
amounts (This allows very efficient sampling of the 
library.), b) there are equal numbers of acidic and basic 
amino acids (D and R, once each) , c) all major classes of 
amino acids are present: acidic, basic, aliphatic 
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hydrophobic, aromatic hydrophobic, and neutral 
hydrophilic . 

7) NNG, which encodes the set [L 2 , R 2 , S , W, P , Q, M, T, K, V, A, E , G, 
stop] : a) fair preponderance of residues that favor 
5 formation of Qf-helices [L, M, A, Q, K, E ; and, to a lesser 

extent, S,R,T]; b) encodes 13 different amino acids. (VHG 
encodes a subset of the set encoded by NNG which encodes 
9 amino acids in nine different DNA sequences, with equal 
acids and bases, and 5/9 being ot helix- f avoring . ) 
10 For the initial variegation, NNT is preferred, in most 

cases. However, when the codon is encoding an amino acid to 
be incorporated into an a helix, NNG is preferred. 

Below, we analyze several simple variegations as to the 
efficiency with which the libraries can be sampled. 
15 Libraries of random hexapeptides encoded by (NNK) 6 have 

been reported (SCOT90, CWIR90) . Table 13 0 shows the expected 
behavior of such libraries. NNK produces single codons for 
PHE, TYR, CYS, TRP, HIS, GLN, ILE, MET, ASN, LYS , ASP, and GLU 
(ot set) ; two codons for each of VAL, ALA, PRO, THR, and GLY (<£> 
20 set) ; and three codons for each of LEU, ARG, and SER {Q set) . 
We have separated the 64,000,000 possible sequences into 28 
classes, shown in Table 13 OA, based on the number of amino 
acids from each of these sets. The largest class is ^QofOfOfo; 
with «14.6% of the possible sequences. Aside from any 
25 selection, all the sequences in one class have the same 
probability of being produced. Table 13 0B shows the 
probability that a given DNA sequence taken from the (NNK) 6 
library will encode a hexapeptide belonging to one of the 
defined classes; note that only ~6.3% of DNA sequences belong 
30 to the QQaaotoi class. 

Table 13 0C shows the expected numbers of sequences in 
each class for libraries containing various numbers of 
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independent transf ormants (viz. 10 6 , 3-10 6 , 10 7 , 3-10 7 , 10 8 , 
3-10 8 , 10 9 , and 3-10 9 ). At 10 6 independent transf ormants (ITs) , 
we expect to see 56% of the class, but only 0.1% of the 

oLoiotoiaoL class. The vast majority of sequences seen come from 
5 classes for which less than 10% of the class is sampled. 

Suppose a peptide from, for example, class QQQQoioi is isolated 
by fractionating the library for binding to a target. 
Consider how much we know about peptides that are related to 
the isolated sequence. Because only 4% of the QQQQoiot class 

10 was sampled, we can not conclude that the amino acids from the 
Q set are in fact the best from the Q set . We might have LEU 
at position 2, but ARG or SER could be better. Even if we 
isolate a peptide of the QQQQQQ class, there is a noticeable 
chance that better members of the class were not present in 

15 the library. 

With a library of 10 7 ITs, we see that several classes 
have been completely sampled, but that the aotaaaa class is 
only 1.1% sampled. At 7.6- 10 7 ITs, we expect display of 50% of 
all amino-acid sequences, but the classes containing three or 

2 0 more amino acids of the or set are still poorly sampled. To 

achieve complete sampling of the (NNK) 6 library requires about 
3-10 9 ITs, 10-fold larger than the largest (NNK) 6 library so 
far reported. 

Table 131 shows expectations for a library encoded by 
25 (NNT) 4 (NNG) 2 . The expectations of abundance are independent of 
the order of the codons or of interspersed unvaried codons . 
This library encodes 0.133 times as many amino-acid sequences, 
but there are only 0.0165 times as many DNA sequences. Thus 
5.0-10 7 ITs (i.e. 60-fold fewer than required for (NNK) 6 ) gives 
30 almost complete sampling of the library. The results would be 
slightly better for (NNT) 6 and slightly, but not much, worse 
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for (NNG) 6 . The controlling factor is the ratio of DNA 
sequences to amino-acid sequences. 

Table 132 shows the ratio of #DNA sequences/#AA sequences 
for codons NNK, NNT, and NNG. For NNK and NNG, we have 
5 assumed that the PBD is displayed as part of an essential 
gene, such as gene III in Ff phage, as is indicated by the 
phrase "assuming stops vanish". It is not in any way required 
that such an essential gene be used. If a non-essential gene 
is used, the analysis would be slightly different; sampling of 

10 NNK and NNG would be slightly less efficient. Note that (NNT) 6 
gives 3.6-fold more amino-acid sequences than (NNK) 5 but 
requires 1.7-fold fewer DNA sequences. Note also that (NNT) 7 
gives twice as many amino-acid sequences as (NNK) 6 , but 3.3- 
f old fewer DNA sequences . 

15 Thus, while it is possible to use a simple mixture (NNS, 

NNK or NNN) to obtain at a particular position all twenty 
amino acids, these simple mixtures lead to a highly biased set 
of encoded amino acids. This problem can be overcome by use 
of complexly variegated codons. 

2 0 Complexly Variegated Codons 

Let Abun(x) be the abundance of DNA sequences coding for 
amino acid x, defined by the distribution of nts at each base 
of the codon. For any distribution, there will be a most- 
favored amino acid (mfaa) with abundance Abun(mfaa) and a 
25 least-favored amino acid (Ifaa) with abundance Abun(lfaa). We 
seek the nt distribution that allows all twenty amino acids 
and that yields the largest ratio Abun (If aa) /Abun (mf aa) 
subject, if desirable to further constraints. 

We first will present the mixture calculated to be 

3 0 optimal when the nt distribution is subject to two 

constraints: equal abundances of acidic and basic amino acids 
and the least possible number of stop codons. Thus only nt 
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distributions that yield Abun (E) +Abun (D) = Abun (R) +Abun (K) are 
considered, and the function maximized is: 
{ (1 -Abun (stop) ) (Abun (If aa) /Abun (mfaa) ) } . 

We have simplified the search for an optimal nt distribution 
5 by limiting the third base to T or G (C or G is equivalent) . 
All amino acids are possible and the number of accessible stop 
codons is reduced because TGA and TAA codons are eliminated. 
The amino acids F, Y, C, H, N, 1, and D require T at the third 
base while W, M, Q, K, and E require G. Thus we use an 

10 equimolar mixture of T and G at the third base. However, it 
should be noted that the present invention embraces use of 
complexly variegated codons in which the third base is not 
limited to T or G (or to C or G) . 

A computer program, written as part of the present 

15 invention and named "Find Optimum vgCodon" (See Table 9) , 

varies the composition at bases 1 and 2, in steps of 0.05, and 
reports the composition that gives the largest value of the 
quantity { (Abun(lfaa) /Abun (mfaa) ( 1 -Abun ( stop) ) ) } . A vg codon 
is symbolically defined by the nucleotide distribution at each 

20 base: 
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The variation of the quantities tl, cl, al, gl , t2, c2 , a2 , 

25 and g2 is subject to the constraint that: 

Abun(E) +Abun(D) - Abun (K) +Abun (R) 

Abun(E) +Abun(D) = gl*a2 

Abun (K) + Abun (R) = al*a2/2 + cl*g2 + al*g2/2 

gl*a2 = al*a2/2 + cl*g2 + al*g2/2 
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Solving for g2 , we obtain 

g2 = (gl*a2 - 0 . 5*al*a2 ) / (cl + 0.5*al) 
In addition, 

tl = 1 - al - cl - gl 
5 t2 = 1 - a2 - c2 - g2 

We vary al, cl, gl, a2 , and c2 and then calculate tl, g2 , and 
t2 . Initially, variation is in steps of 5%. Once an 
approximately optimum distribution of nucleotides is 
determined, the region is further explored with steps of 1%. 

10 The logic of this program is shown in Table 9. The optimum 
distribution (the "gfk" codon) is shown in Table 10A and 
yields DNA molecules encoding each type amino acid with the 
abundanc e s shown . 

Note that this chemistry encodes all twenty amino acids, 

15 with acidic and basic amino acids being equiprobable , and the 
most favored amino acid (serine) is encoded only 2.454 times 
as often as the least favored amino acid (tryptophan) . The 
"gfk" vg codon improves sampling most for peptides containing 
several of the amino acids [F, Y, C, W, H, Q, I , M, N, K, D, E] for which 

2 0 NNK or NNS provide only one codon. Its sampling advantages 
are most pronounced when the library is relatively small. 

A modification of "Find Optimum vgCodon" varies the 
composition at bases 1 and 2, in steps of 0.01, and reports 
the composition that gives the largest value of the quantity 

25 { (Abun (If aa) /Abun (mf aa) ) } without any restraint on the 

relative abundance of any amino acids. The results of this 
optimization is shown in Table 10B. The changes are small, 
indicating that insisting on equality of acids and bases and 
minimizing stop codons costs us little. Also note that, 

30 without restraining the optimization, the prevalence of acidic 
and basic amino acids comes out fairly close. On the other 
hand, relaxing the restriction leaves a distribution in which 
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the least favored amino acid is only .412 times as prevalent 
as SER. 

The advantages of an NNT codon are discussed elsewhere in 
the present application. Unoptimized NNT provides 15 amino 
acids encoded by only 16 DNA sequences. It is possible to 
improve on NNT as follows. First note that the SER codons 
occur in the T and A rows of the genetic-code table and in the 
C and G columns . 

[SER] = Ti x C 2 + Ai x G 2 
If we reduce the prevalence of SER by reducing Ti, C 2 , Ai, and 
G 2 relative to other bases, then we will also reduce the 
prevalence of PHE, TYR, CYS, PRO, THR, ALA, ARG, GLY, ILE, and 
ASN. The prevalence of LEU, HIS, VAL, and ASP will rise. If 
we assume that Ti, C 2 , A x , and G 2 are all lowered to the same 
extent and that C x , d, T 2 , and A 2 are increased by the same 
amount, we can compute a shift that makes the prevalence of 
SER equal the prevalences of LEU, HIS, VAL, and ASP. The 
decreases in each of PHE, TYR, CYS, PRO, THR, ALA, ARG, GLY, 
ILE, and ASN are not equal; CYS and THR are reduced more than 
the others . 

Let the distribution be 

T C A G 



base #1 =.25-q .25+q .25-q .25+q 
base #2 =.25+q .25-q .25+q .25-q 
base #3 =.1.00 0.0 0.0 0.0 



Setting [SER] = [LEU] - [HIS] = [VAL] = [ASP] gives: 
(.25-q) • (.25-q) + (.25-q) • (.25-q) = (.25+q) - (.25+q) 
2- (.25-q) 2 = (.25+q) 2 
q 2 -1 . 5 q + . 0625 = 0 
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q = (3/4) - /2/2 = .0428 

This distribution (shown in Table 10C) gives five amino 
acids (SER, LEU, HIS, VAL, ASP) in very nearly equal amounts. 
A further eight amino acids (PHE, TYR, ILE, ASN, PRO, ALA, 
5 ARG, GLY) are present at 78% the abundance of SER. THR and 

CYS remain at half the abundance of SER. When variegating DNA 
for disulf ide-bonded mini-proteins, it is often desirable to 
reduce the prevalence of CYS. This distribution allows 13 
amino acids to be seen at high level and gives no stops; the 
10 optimized fxS distribution allows only 11 amino acids at high 
prevalence . 

The NNG codon can also be optimized. Table 10D shows an 
approximately optimized NNG codon. When equimolar T,C,A,G are 
used in NNG, one obtains double doses of LEU and ARG. To 

15 improve the distribution, we increase Gi by 45, decrease Ti and 
Ai by 5 each and Ci by 26. We adopt this pattern because Ci 
affects both LEU and ARG while T x and A x each affect either LEU 
or ARG, but not both. Similarly, we decrease T 2 and G 2 by r 
while we increase C 2 and A 2 by r. We adjusted 6 and r until 

20 [ALA] « [ARG] . There are, under this variegation, four 

equally most favored amino acids: LEU, ARG, ALA, and GLU. 
Note that there is one acidic and one basic amino acid in this 
set. There are two equally least favored amino acids: TRP 
and MET. The ratio of lfaa/mfaa is 0.5258. If this codon is 

25 repeated six times, peptides composed entirely of TRP and MET 
are 2% as common as peptides composed entirely of the most 
favored amino acids. We refer to this as "the prevalence of 
(TRP/MET) 6 in optimized NNG 6 vgDNA" . 

When synthesizing vgDNA by the "dirty bottle" method, it 

30 is sometimes desirable to use only a limited number of mixes. 
One very useful mixture is called the "optimized NNS mixture" 
in which we average the first two positions of the fxS 
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mixture: Ti = 0.24, Ci = 0.17, A x = 0.33, G x = 0.26, the second 
position is identical to the first, C 3 = G 3 = 0 . 5 . This 
distribution provides the amino acids ARG, SER, LEU, GLY, VAL, 
THR, ASN, and LYS at greater than 5% plus ALA, ASP, GLU, ILE, 
5 MET, and TYR at greater than 4%. 

An additional complexly variegated codon is of interest. 
This codon is identical to the optimized NNT codon at the 
first two positions and has T:G::90:10 at the third position. 
This codon provides thirteen amino acids (ALA, ILE, ARG, SER, 

10 ASP, LEU, VAL, PHE, ASN, GLY, PRO, TYR, and HIS) at more than 
5.5%. THR at 4.3% and CYS at 3.9% are more common than the 
LFAAs of NNK (3.125%). The remaining five amino acids are 
present at less than 1%. This codon has the feature that all 
amino acids are present; sequences having more than two of the 

15 low-abundance amino acids are rare. When we isolate an SBD 

using this codon, we can be reasonably sure that the first 13 
amino acids were tested at each position. A similar codon, 
based on optimized NNG, could be used. 

Table 10E shows some properties of an unoptimized NNS (or 

2 0 NNK) codon. Note that there are three equally most -favored 
amino acids: ARG, LEU, and SER. There are also twelve 
equally least favored amino acids: PHE, ILE, MET, TYR, HIS, 
GLN, ASN, LYS, ASP, GLU, CYS, and TRP . Five amino acids (PRO, 
THR, ALA, VAL, GLY) fall in between. Note that a six-fold 

25 repetition of NNS gives sequences composed of the amino acids 
[PHE, ILE, MET, TYR, HIS, GLN, ASN, LYS, ASP, GLU, CYS, and 
TRP] at only «0.1% of the sequences composed of [ARG, LEU, and 
SER] . Not only is this «20-fold lower than the prevalence of 
(TRP /MET) 6 in optimized NNG 6 vgDNA, but this low prevalence 

30 applies to twelve amino acids. 
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Diffuse Mutagenesis 

Diffuse Mutagenesis can be applied to any part of the 
protein at any time, but is most appropriate when some binding 
to the target has been established. Diffuse Mutagenesis can 
be accomplished by spiking each of the pure nts activated for 
DNA synthesis (e.g. nt-phosphoramidites) with a small amount 
of one or more of the other activated nts. 

Contrary to general practice, the present invention sets 
the level of spiking so that only a small percentage (1% to 
.00001%, for example) of the final product will contain the 
initial DNA sequence. This will insure that many single, 
double, triple, and higher mutations occur, but that recovery 
of the basic sequence will be a possible outcome. Let N b be 
the number of bases to be varied, and let Q be the fraction of 
all sequences that should have the parental sequence, then M, 
the fraction of the mixture that is the majority component, is 

M = exp{ log e (Q)/N b } = 10 (log 10 (Q) /N b> . 
If, for example, thirty base pairs on the DNA chain were to be 
varied and 1% of the product is to have the parental sequence, 
then each mixed nt substrate should contain 86% of the 
parental nt and 14% of other nts. Table 8 shows the fraction 
(fn) of DNA molecules having n non-parental bases when 3 0 
b ases ar e synthesized with reagents that contain fraction M of 
the majority component. When M= . 63096, f24 and higher are 
less than 10" 8 . The entry "most" in Table 8 is the number of 
changes that has the highest probability. Note that 
substantial probability for multiple substitutions only occurs 
if the fraction of parental sequence (fO) is allowed to drop 
to around 10" 6 . The N b base pairs of the DNA chain that are 
synthesized with mixed reagents need not be contiguous. They 
are picked so that between N b /3 and N b codons are affected to 
various degrees. The residues picked for mutation are picked 
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with reference to the 3D structure of the IPBD, if known. For 
example, one might pick all or most of the residues in the 
principal and secondary set. We may impose restrictions on 
the extent of variation at each of these residues based on 
5 homologous sequences or other data. The mixture of non- 
parental nts need not be random, rather mixtures can be biased 
to give particular amino acid types specific probabilities of 
appearance at each codon. For example, one residue may 
contain a hydrophobic amino acid in all known homologous 

10 sequences; in such a case, the first and third base of that 
codon would be varied, but the second would be set to T. 
Other examples of how this might be done are given in the 
horse heart myoglobin example. This diffuse structure- 
directed mutagenesis will reveal the subtle changes possible 

15 in protein backbone associated with conservative interior 

changes, such as V to I , as well as some not so subtle changes 
that require concomitant changes at two or more residues of 
the protein. 

Ill ,D. Special Considerations Relating to Variegation of 

2 0 Mini-Proteins with Essential Cysteines 

Several of the preferred simple or complex variegated 
codons encode a set of amino acids which includes cysteine. 
This means that some of the encoded binding domains will 
feature one or more cysteines in addition to the invariant 

25 disulf ide-bonded cysteines. For example, at each NNT-encoded 
position, there is a one in sixteen chance of obtaining 
cysteine. If six codons are so varied, the fraction of 
domains containing additional cysteines is 0.33. Odd numbers 
of cysteines can lead to complications, see Perry and Wetzel 

30 (PERR86) . On the other hand, many disulfide- containing 

proteins contain cysteines that do not form disulfides, e.g. 
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trypsin. The possibility of unpaired cysteines can be dealt 
with in several ways : 

First, the variegated phage population can be passed over 

r 

an immobilized reagent that strongly binds free thiols, such 
5 as SulfoLink (catalogue number 44895 H from Pierce Chemical 
Company, Rockford, Illinois, 61105) . Another product from 
Pierce is TNB- Thiol Agarose (Catalogue Code 20409 H) . BioRad 
sells Affi- Gel 401 (catalogue 153-4599) for this purpose. 
Second, one can use a variegation that excludes cysteines, 
10 such as: 

NHT that gives [F, S , Y , L, P , H, I , T, N, V, A, D] , 
VNS that gives 

[L 2 ,P 2 ,H,Q,R 3 , I,M,T 2 ,N,K,S,V 2 ,A 2 ,E,D,G 2 ] , 
|r NNG that gives [L 2 ,S,W,P,Q,R 2 ,M,T,K,R,V,A,E,G, stop] , 

15 SNT that gives [L, P, H, R, V, A, D, G] , 

RNG that gives [M, T, K, R, V, A, E, G] , 
RMG that gives [T, K, A, E] , 

VNT that gives [L, P, H, R, I , T, N, S , V, A, D, G] , or 
RRS that gives [N, S , K, R, D, E, G 2 ] . 

2 0 However, each of these schemes has one or more of the 

disadvantages, relative to NNT : a) fewer amino acids are 
allowed, b) amino acids are not evenly provided, c) acidic and 
basic amino acids are not equally likely) , or d) stop codons 
occur. Nonetheless, NNG, NHT, and VNT are almost as useful as 
25 NNT. NNG encodes 13 different amino acids and one stop 

signal. Only two amino acids appear twice in the 16-fold mix. 

Thirdly, one can enrich the population for binding to the 
preselected target, and evaluate selected sequences post hoc 
for extra cysteines . Those that contain more cysteines than 

3 0 the cysteines provided for conformational constraint may be 

perfectly usable. It is possible that a disulfide linkage 
other than the designed one will occur. This does not mean 
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that the binding domain defined by the isolated DNA sequence 
is in any way unsuitable. The suitability of the isolated 
domains is best determined by chemical and biochemical 
evaluation of chemically synthesized peptides. 
5 Lastly, one can block free thiols with reagents, such as 

Ellman's reagent, iodoacetate, or methyl iodide, that 
specifically bind free thiols and that do not react with 
disulfides, and then leave the modified phage in the 
population. It is to be understood that the blocking agent 

10 may alter the binding properties of the mini-protein; thus, 
one might use a variety of blocking reagent in expectation 
that different binding domains will be found. The variegated 
population of thiol -blocked genetic packages are fractionated 
for binding. If the DNA sequence of the isolated binding 

15 mini -protein contains an odd number of cysteines, then 

synthetic means are used to prepare mini -proteins having each 
possible linkage and in which the odd thiol is appropriately 
blocked. Nishiuchi (NISH82, NISH86, and works cited therein) 
disclose methods of synthesizing peptides that contain a 

20 plurality of cysteines so that each thiol is protected with a 
different type of blocking group. These groups can be 
selectively removed so that the disulfide pairing can be 
controlled. We envision using such a scheme with the 
alteration that one thiol either remains blocked, or is 

25 unblocked and then reblocked with a different reagent. 

Ill .E. Planning the Second and Later Rounds of Variegation 

The method of the present invention allows efficient 
accumulation of information concerning the amino-acid sequence 
of a binding domain having high affinity for a predetermined 

3 0 target. Although one may obtain a highly useful binding 
domain from a single round of variegation and affinity 
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enrichment, we expect that multiple rounds will be needed to 
achieve the highest possible affinity and specificity. 

If the first round of variegation results in some binding 
to the target, but the affinity for the target is still too 
5 low, further improvement may be achieved by variegation of the 
SBDs . Preferably, the process is progressive, i.e. each 
variegation cycle produces a better starting point for the 
next variegation cycle than the previous cycle produced. 
Setting the level of variegation such that the ppbd and many 

10 sequences related to the ppbd sequence are present in 

detectable amounts ensures that the process is progressive. 
If the level of variegation is so high that the ppbd sequence 
is present at such low levels that there is an appreciable 
chance that no transformant will display the PPBD, then the 

15 best SBD of the next round could be worse than the PPBD. At 
excessively high level of variegation, each round of 
mutagenesis is independent of previous rounds and there is no 
assurance of progressivity . This approach can lead to 
valuable binding proteins, but repetition of experiments with 

20 this level of variegation will not yield progressive results. 
Excessive variation is not preferred. 

Progressivity is not an all-or-nothing property. So long 
as most of the information obtained from previous variegation 
cycles is retained and many different surfaces that are 

25 related to the PPBD surface are produced, the process is 

progressive. If the level of variegation is so high that the 
ppbd gene may not be detected, the assurance of progressivity 
diminishes. If the probability of recovering PPBD is 
negligible, then the probability of progressive behavior is 

30 also negligible. 

A level of variegation that allows recovery of the PPBD 
has two properties : 
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1) we can not regress because the PPBD is available, 

2) an enormous number of multiple changes related to the 
PPBD are available for selection and we are able to 
detect and benefit from these changes. 

5 It is very unlikely that all of the variants will be worse 
than the PPBD; we desire the presence of PPBD at detectable 
levels to insure that all the sequences present are indeed 
related to PPBD. 

An opposing force in our design considerations is that 

10 PBDs are useful in the population only up to the amount that 
can be detected; any excess above the detectable amount is 
wasted. Thus we produce as many surfaces related to PPBD as 
possible within the constraint that the PPBD be detectable. 

If the level of variegation in the previous variegation 

15 cycle was correctly chosen, then the amino acids selected to 
be in the residues just varied are the ones best determined. 
The environment of other residues has changed, so that it is 
appropriate to vary them again. Because there are often more 
residues in the principal and secondary sets than can be 

2 0 varied simultaneously, we start by picking residues that 

either have never been varied (highest priority) or that have 
not been varied for one or more cycles. If we find that 
varying all the residues except those varied in the previous 
cycle does not allow a high enough level of diversity, then 
25 residues varied in the previous cycle might be varied again. 
For example, if M ntv (the number of independent transf ormants 
that can be produced from Y D i 0 o of DNA) and C sens i (the 
sensitivity of the affinity separation) were such that seven 
residues could be varied, and if the principal and secondary 

3 0 sets contained 13 residues, we would always vary seven 

residues, even though that implies varying some residue twice 
in a row. In such cases, we would pick the residues just 
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varied that contain the amino acids of highest abundance in 
the variegated codons used. 

It is the accumulation of information that allows the 
process to select those protein sequences that produce binding 
between the SBD and the target. Some interfaces between 
proteins and other molecules involve twenty or more residues. 
Complete variation of twenty residues would generate 10 26 
different proteins. By dividing the residues that lie close 
together in space into overlapping groups of five to seven 
residues, we can vary a large surface but never need to test 
more than 10 7 to 10 9 candidates at once, a savings of 10 19 to 
10 17 fold. The power of selection with accumulation of 
information is well illustrated in Chapter 3 of DAWK86. 

Use of NNT or NNG variegated codons leads to very 
efficient sampling of variegated libraries because the ratio 
of (different amino-acid sequences) / (different DNA sequences) 
is much closer to unity than it is for NNK or even the 
optimized vg codon (fxS) . Nevertheless, a few amino acids are 
omitted in each case. Both NNT and NNG allow members of all 
important classes of amino acids: hydrophobic, hydrophilic, 
acidic, basic, neutral hydrophilic, small, and large. After 
selecting a binding domain, a subsequent variegation and 
selection may be desirable to achieve a higher affinity or 
specificity. During this second variegation, amino acid 
possibilities overlooked by the preceding variegation may be 
investigated . 

In the first round, we assume that the parental protein 
has no known affinity for the target material. For example, 
consider the parental mini-protein, similar to that discussed 
in Example 11, having the structure X 1 -C 2 -X3-X4-X 5 -X 6 -C7-X 8 (SEQ 
ID NO: 7) in which C 2 and C 7 form a disulfide bond. 
Introduction of extra cysteines may cause alternative 
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structures to form which might be disadvantageous. Accidental 
cysteines at positions 4 or 5 are thought to be potentially 
more troublesome than at the other positions. We adopt the 
pattern of variegation: Xi:NNT, X 3 :NNT, X 4 :NNG, X 5 :NNG, X 6 :NNT, 
5 and X 8 :NNT, so that cysteine can not occur at positions 4 and 5 

Il ' W) . (Table 131 shows the number of different amino acids 
expected in libraries prepared with DNA variegated in this way 
and comprising different numbers of independent 

10 trans f ormants . ) 

In the second round of variegation, a preferred strategy 
is to vary each position through a new set of residues which 
includes the amino acid(s) which were found at that position 
in the successful binding domains, and which include as many 

15 as possible of the residues which were excluded in the first 
round of variegation. 

A few examples may be helpful. Suppose we obtained PRO 
using NNT. This amino acid is available with either NNT or 
NNG. We can be reasonably sure that PRO is the best amino 

2 0 acid from the set [PRO, LEU, VAL, THR, ALA, ARG, GLY, PHE, 

TYR, CYS, HIS, ILE, ASN, ASP, SER] . Thus we need to try a set 
that includes [PRO, TRP, GLN, MET, LYS, GLU] . The set allowed 
by NNG is the preferred set. 

What if we obtained HIS instead? Histidine is aromatic 

2 5 and fairly hydrophobic and can form hydrogen bonds to and from 

the imidazole ring. Tryptophan is hydrophobic and aromatic 
and can donate a hydrogen to a suitable acceptor and was 
excluded by the NNT codon. Methionine was also excluded and 
is hydrophobic. Thus, one preferred course is to use the 

3 0 variegated codon HDS that allows [HIS, GLN, ASN, LYS, TYR, 

CYS, TRP, ARG, SER, GLY, <stop>] . 
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GLN can be encoded by the NNG codon. If GLN is selected, 
at the next round we might use the vg codon VAS that encodes 
three of the seven excluded possibilities, viz. HIS, ASN, and 
ASP. The codon VAS encodes 6 amino acid sequences in six DNA 
5 sequences. This leaves PHE, CYS, TYR, and ILE untested, but 
these are all very hydrophobic. Switching to NNT would be 
undesirable because that would exclude GLN. One could use NAS 
that includes TYR and <stop>. Suppose the successful amino 
acid encoded by an NNG codon was ARG. Here we switch to NNT 

10 because this allows ARG plus all the excluded possibilities. 

THR is another possibility with the NNT codon. If THR is 
selected, we switch to NNG because that includes the 
previously excluded possibilities and includes THR. Suppose 
the successful amino acid encoded by the NNT codon was ASP. 

15 We use RRS at the next variegation because this includes both 
acidic amino acids plus LYS and ARG. One could also use VRS 
to allow GLN. 

Thus, later rounds of variegation test both amino acid 
positions not previously mutated, and amino acid substitutions 

2 0 at a previously mutated position which were not within the 
previous substitution set. 

If the first round of variegation is entirely 
unsuccessful, a different pattern of variegation should be 
used. For example, if more than one interaction set can be 

25 defined within a domain, the residues varied in the next round 
of variegation should be from a different set than that probed 
in the initial variegation. If repeated failures are 
encountered, one may switch to a different IPBD. 



98 

IV, DISPLAY STRATEGY: DISPLAYING FOREIGN BINDING DOMAINS ON 
THE SURFACE OF A "GENETIC PACKAGE" 

IV. A. General Requirements for Genetic Packages 

It is emphasized that the GP on which selection- through- 
5 binding will be practiced must be capable, after the 

selection, either of growth in some suitable environment or of 
in vitro amplification and recovery of the encapsulated 
genetic message. During at least part of the growth, the 
increase in number is preferably approximately exponential 

10 with respect to time. The component of a population that 

exhibits the desired binding properties may be quite small, 
for example, one in 10 6 or less. Once this component of the 
population is separated from the non-binding components, it 
must be possible to amplify it. Culturing viable cells is the 

15 most powerful amplification of genetic material known and is 
preferred. Genetic messages can also be amplified in vitro, 
e.g. by PCR, but this is not the most preferred method. 

Preferred GPs are vegetative bacterial cells, bacterial 
spores and bacterial DNA viruses. Eukaryotic cells could be 

2 0 used as genetic packages but have longer dividing times and 
more stringent nutritional requirements than do bacteria and 
it is much more difficult to produce a large number of 
independent transf ormants . They are also more fragile than 
bacterial cells and therefore more difficult to chromatograph 

25 without damage. Eukaryotic viruses could be used instead of 
bacteriophage but must be propagated in eukaryotic cells and 
therefore suffer from some of the amplification problems 
mentioned above. 

Nonetheless, a strain of any living cell or virus is 

30 potentially useful if the strain can be: 1) genetically 

altered with reasonable facility to encode a potential binding 
domain, 2) maintained and amplified in culture, 3) manipulated 
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to display the potential binding protein domain where it can 
interact with the target material during affinity separation, 
and 4) affinity separated while retaining the genetic 
information encoding the displayed binding domain in 
5 recoverable form. Preferably, the GP remains viable after 
affinity separation . 

When the genetic package is a bacterial cell, or a phage 
which is assembled periplasmically , the display means has two 
components. The first component is a secretion signal which 

10 directs the initial expression product to the inner membrane 
of the cell (a host cell when the package is a phage) . This 
secretion signal is cleaved off by a signal peptidase to yield 
a processed, mature, potential binding protein. The second 
component is an outer surface transport signal which directs 

15 the package to assemble the processed protein into its outer 
surface. Preferably, this outer surface transport signal is 
derived from a surface protein native to the genetic package. 

For example, in a preferred embodiment, the hybrid gene 
comprises a DNA encoding a potential binding domain operably 

20 linked to a signal sequence ( e.g. , the signal sequences of the 
bacterial phoA or bla genes or the signal sequence of M13 
phage genelll ) and to DNA encoding a coat protein ( e.g. , the 
M13 gene III or gene VIII proteins) of a filamentous phage 
( e.g. , M13) . The expression product is transported to the 

25 inner membrane (lipid bilayer) of the host cell, whereupon the 
signal peptide is cleaved off to leave a processed hybrid 
protein. The C- terminus of the coat protein- like component of 
this hybrid protein is trapped in the lipid bilayer, so that 
the hybrid protein does not escape into the periplasmic space. 

30 (This is typical of the wild-type coat protein.) As the 

single- stranded DNA of the nascent phage particle passes into 
the periplasmic space, it collects both wild- type coat protein 
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and the hybrid protein from the lipid bilayer. The hybrid 
protein is thus packaged into the surface sheath of the 
filamentous phage, leaving the potential binding domain 
exposed on its outer surface. (Thus, the filamentous phage, 
5 not the host bacterial cell, is the "replicable genetic 
package" in this embodiment.) 

If a secretion signal is necessary for the display of the 
potential binding domain, in an especially preferred 
embodiment the bacterial cell in which the hybrid gene is 
10 expressed is of a "secretion-permissive" strain. 

When the genetic package is a bacterial spore, or a phage 
whose coat is assembled intracellularly , a secretion signal 
directing the expression product to the inner membrane of the 
host bacterial cell is unnecessary. In these cases, the 
15 display means is merely the outer surface transport signal, 
typically a derivative of a spore or phage coat protein. 

There are several methods of arranging that the ipbd gene 
is expressed in such a manner that the IPBD is displayed on 
the outer surface of the GP. If one or more fusions of 

2 0 fragments of x genes to fragments of a natural osp gene are 

known to cause X protein domains to appear on the GP surface, 
then we pick the DNA sequence in which an ipbd gene fragment 
replaces the x gene fragment in one of the successful osp-x 
fusions as a preferred gene to be tested for the display-of- 
25 IPBD phenotype. (The gene may be constructed in any manner.) 

If no fusion data are available, then we fuse an ipbd fragment 
to various fragments, such as fragments that end at known or 
predicted domain boundaries, of the osp gene and obtain GPs 
that display the osp -ipbd fusion on the GP outer surface by 

3 0 screening or selection for the display-of -IPBD phenotype. The 

OSP may be modified so as to increase the flexibility and/or 
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length of the linkage between the OSP and the IPBD and thereby 
reduce interference between the two. 

The fusion of ipbd and osp fragments may also include 
fragments of random or pseudorandom DNA to produce a 
5 population, members of which may display IPBD on the GP 
surface. The members displaying IPBD are isolated by 
screening or selection for the display-of -binding phenotype . 

The replicable genetic entity (phage or plasmid) that 
carries the osp-pbd genes (derived from the osp- ipbd gene) 

10 through the selection- through-binding process, is referred to 
hereinafter as the operative cloning vector (OCV) . When the 
OCV is a phage, it may also serve as the genetic package. The 
choice of a GP is dependent in part on the availability of a 
suitable OCV and suitable OSP. 

15 Preferably, the GP is readily stored, for example, by 

freezing. If the GP is a cell, it should have a short 
doubling time, such as 20-40 minutes. If the GP is a virus, 
it should be prolific, e.g. , a burst size of at least 
100/infected cell. GPs which are finicky or expensive to 

2 0 culture are disfavored. The GP should be easy to harvest, 

preferably by centrif ugat ion . The GP is preferably stable for 
a temperature range of -70 to 42 °C (stable at 4°C for several 
days or weeks) ; resistant to shear forces found in HPLC; 
insensitive to UV; tolerant of desiccation; and resistant to a 
25 pH of 2.0 to 10.0, surface active agents such as SDS or 

Triton, chaotropes such as 4M urea or 2M guanidinium HC1 , 
common ions such as K + , Na + , and S0 4 "", common organic solvents 
such as ether and acetone, and degradative enzymes. Finally, 
there must be a suitable OCV. 

3 0 Although knowledge of specific OSPs may not be required 

for vegetative bacterial cells and endospores, the user of the 
present invention, preferably, will know: Is the sequence of 
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any osp known? (preferably yes, at least one required for 
phage) . How does the OSP arrive at the surface of GP? 
(knowledge of route necessary, different routes have different 
uses, no route preferred per se ) . Is the OSP 
5 post-translationally processed? (no processing most preferred, 
predictable processing preferred over unpredictable 
processing) . What rules are known governing this processing, 
if there is any processing? (no processing most preferred, 
predictable processing acceptable) . What function does the 

10 OSP serve in the outer surface? (preferably not essential) . 

Is the 3D structure of an OSP known? (highly preferred) . Are 
fusions between fragments of osp and a fragment of x known? 
Does expression of these fusions lead to X appearing on the 
surface of the GP? (fusion data is as preferred as knowledge 

15 of a 3D structure) . Is a "2D" structure of an OSP available? 
(in this context, a "2D" structure indicates which residues 
are exposed on the cell surface) (2D structure less preferred 
than 3D structure) . Where are the domain boundaries in the 
OSP? (not as preferred as a 2D structure, but acceptable) . 

2 0 Could IPBD go through the same process as OSP and fold 

correctly? (IPBD might need prosthetic groups) (preferably 
IPBD will fold after same process) . Is the sequence of an osp 
promoter known? (preferably yes) . Is osp gene controlled by 
regulatable promoter available? (preferably yes) . What 
25 activates this promoter? (preferably a diffusible chemical, 

such as IPTG) . How many different OSPs do we know? (the more 
the better) . How many copies of each OSP are present on each 
package? (more is better) . 

The user will want knowledge of the physical attributes 

3 0 of the GP: How large is the GP? (knowledge useful in deciding 

how to isolate GPs) (preferably easy to separate from soluble 
proteins such as IgGs) . What is the charge on the GP? 
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(neutral preferred) . What is the sedimentation rate of the 
GP? (knowledge preferred, no particular value preferred) . 

The preferred GP, OCV and OSP are those for which the 
fewest serious obstacles can be seen, rather than the one that 
5 scores highest on any one criterion. 

Viruses are preferred over bacterial cells and spores 
(cp. LUIT85 and references cited therein). The virus is 
preferably a DNA virus with a genome size of 2 kb to 10 kb 
base pairs, such as (but not limited to) the filamentous (Ff) 

10 phage M13, fd, and fl ( inter alia see RASC86, BOEK80, BOEK82, 
DAYL88, GRAY81b, KUHN88, LOPE85, WEBS 8 5 , MARV75, MARV80, 
MOSE82, CRIS84, SMIT88a, SMIT88b) ; the IncN specific phage 
Ike and Ifl (NAKA81, PEET85, PEET87, THOM83 , THOM8 8a) ; IncP- 
specific Pseudomonas aeruginosa phage Pfl (THOM83, THOM88a) 

15 and Pf3 (LUIT83, LUIT85, LUTI87, THOM88a) ; and the Xanthomonas 
oryzae phage Xf (THOM83, THOM88a) . Filamentous phage are 
especially preferred. 

Preferred OSPs for several GPs are given in Table 2 . 
References to osp-ipbd fusions in this section should be taken 

2 0 to apply, mutatis mutandis , to osp-pbd and osp-sbd fusions as 
well . 

The species chosen as a GP should have a well- 
characterized genetic system and strains defective in genetic 
recombination should be available. The chosen strain may need 

2 5 to be manipulated to prevent changes of its physiological 
state that would alter the number or type of proteins or 
other molecules on the cell surface during the affinity 
separation procedure. 
IV. B. Phages for Use as GPs : 

30 Unlike bacterial cells and spores, choice of a phage 

depends strongly on knowledge of the 3D structure of an OSP 
and how it interacts with other proteins in the capsid. This 
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does not mean that we need atomic resolution of the OSP, but 
that we need to know which segments of the OSP interact to 
make the viral coat and which segments are not constrained by- 
structural or functional roles. The size of the phage genome 
5 and the packaging mechanism are also important because the 

phage genome itself is the cloning vector. The osp-ipbd gene 
is inserted into the phage genome; therefore: 1) the genome 
of the phage must allow introduction of the osp-ipbd gene 
either by tolerating additional genetic material or by having 

10 replaceable genetic material; 2) the virion must be capable of 
packaging the genome after accepting the insertion or 
substitution of genetic material, and 3) the display of the 
OSP-IPBD protein on the phage surface must not disrupt virion 
structure sufficiently to interfere with phage propagation. 

15 The morphogenetic pathway of the phage determines the 

environment in which the IPBD will have opportunity to fold. 
Periplasmically assembled phage are preferred when IPBDs 
contain essential disulfides, as such IPBDs may not fold 
within a cell (these proteins may fold after the phage is 

2 0 released from the cell) . Intracellularly assembled phage are 
preferred when the IPBD needs large or insoluble prosthetic 
groups (such as Fe 4 S 4 clusters) , since the IPBD may not fold if 
secreted because the prosthetic group is lacking. 

When variegation is introduced in Part II, multiple 

2 5 infections could generate hybrid GPs that carry the gene for 

one PBD but have at least some copies of a different PBD on 
their surfaces; it is preferable to minimize this possibility 
by infecting cells with phage under conditions resulting in a 
low multiple-of -infection (MOI) . 

3 0 Bacteriophages are excellent candidates for GPs because 

there is little or no enzymatic activity associated with 
intact mature phage, and because the genes are inactive 
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outside a bacterial host, rendering the mature phage particles 
metabolically inert. 

The filamentous phages ( e.g. , M13) are of particular 
interest . 

5 For a given bacteriophage, the preferred OSP is usually- 

one that is present on the phage surface in the largest number 
of copies, as this allows the greatest flexibility in varying 
the ratio of OSP-IPBD to wild type OSP and also gives the 
highest likelihood of obtaining satisfactory affinity 

10 separation. Moreover, a protein present in only one or a few 
copies usually performs an essential function in morphogenesis 
or infection; mutating such a protein by addition or insertion 
is likely to result in reduction in viability of the GP. 
Nevertheless, an OSP such as M13 gill protein may be an 

15 excellent choice as OSP to cause display of the PBD. 

It is preferred that the wild- type osp gene be preserved. 
The ipbd gene fragment may be inserted either into a second 
copy of the recipient osp gene or into a novel engineered osp 
gene. It is preferred that the osp- ipbd gene be placed under 

2 0 control of a regulated promoter. Our process forces the 

evolution of the PBDs derived from IPBD so that some of them 
develop a novel function, viz . binding to a chosen target. 
Placing the gene that is subject to evolution on a duplicate 
gene is an imitation of the widely-accepted scenario for the 

25 evolution of protein families. It is now generally accepted 
that gene duplication is the first step in the evolution of a 
protein family from an ancestral protein. By having two 
copies of a gene, the affected physiological process can 
tolerate mutations in one of the genes. This process is well 

30 understood and documented for the globin family ( cf . DICK83, 
p65ff , and CREI84, pll7- 125) . 
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The user must choose a site in the candidate OSP gene for 
inserting a ipbd gene fragment. The coats of most 
bacteriophage are highly ordered. Filamentous phage can be 
described by a helical lattice; isometric phage, by an 
5 icosahedral lattice. Each monomer of each major coat protein 
sits on a lattice point and makes defined interactions with 
each of its neighbors. Proteins that fit into the lattice by 
making some, but not all, of the normal lattice contacts are 
likely to destabilize the virion by: a) aborting formation of 

10 the virion, b) making the virion unstable, or c) leaving gaps 
in the virion so that the nucleic acid is not protected. Thus 
in bacteriophage, unlike the cases of bacteria and spores, it 
is important to retain in engineered OSP- IPBD fusion proteins 
those residues of the parental OSP that interact with other 

15 proteins in the virion. For M13 gVIII, we retain the entire 

mature protein, while for M13 gill, it might suffice to retain 
the last 100 residues (or even fewer) . Such a truncated gill 
protein would be expressed in parallel with the complete gill 
protein, as gill protein is required for phage infectivity. 

20 Il'ichev et al . (ILIC89) have reported viable phage 

having alterations in gene VIII . In one case, a point 
mutation changed one amino acid near the amino terminus of the 
mature gVIII protein from GLU to ASP. In the other case, five 
amino acids were inserted at the site of the first mutation. 

25 They suggested that similar constructions could be used for 
vaccines. They did not report on any binding properties of 
the modified phage, nor did they suggest mutagenizing the 
inserted material. Furthermore, they did not insert a binding 
domain, nor did they suggest inserting such a domain. 

30 Further considerations on the design of the ipbd: :osp 

gene is discussed in section IV. F. 
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Fi 1 amentous phage : 

Compared to other bacteriophage, filamentous phage in 
general are attractive and M13 in particular is especially 
attractive because: 1) the 3D structure of the virion is 
5 known; 2) the processing of the coat protein is well 

understood; 3) the genome is expandable; 4) the genome is 
small; 5) the sequence of the genome is known; 6) the virion 
is physically resistant to shear, heat, cold, urea, 
guanidinium CI, low pH, and high salt; 7) the phage is a 

10 sequencing vector so that sequencing is especially easy; 8) 
antibiotic-resistance genes have been cloned into the genome 
with predictable results (HINE80) ; 9) It is easily cultured 
and stored (FRIT85) , with no unusual or expensive media 
requirements for the infected cells, 10) it has a high burst 

15 size, each infected cell yielding 100 to 1000 M13 progeny 
after infection; and 11) it is easily harvested and 
concentrated (SALI64, FRIT85) . 

The filamentous phage include M13, fl, fd, Ifl, Ike, Xf, 
Pfl, and Pf3. 

20 The entire life cycle of the filamentous phage M13, a 

common cloning and sequencing vector, is well understood- M13 
and fl are so closely related that we consider the properties 
of each relevant to both (RASC86) ; any differentiation is for 
historical accuracy. The genetic structure (the complete 

25 sequence (SCHA78) , the identity and function of the ten genes, 
and the order of transcription and location of the promoters) 
of M13 is well known as is the physical structure of the 
virion (BANN81, BOEK80, CHAN79, ITOK79, KAPL78, KUHN85b, 
KUHN87, MAKO80, MARV78, MESS78, 0HKA81, RASC86, RUSS81, 

30 SCHA78, SMIT85, WEBS78, and ZIMM82); see RASC86 for a recent 
review of the structure and function of the coat proteins. 
Because the genome is small (6423 bp) , cassette mutagenesis is 
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practical on RF M13 (AUSU87) , as is single- stranded oligo-nt 
directed mutagenesis (FRIT85) . M13 is a plasmid and 
transformation system in itself, and an ideal sequencing 
vector. M13 can be grown on Rec" strains of coli. The M13 
5 genome is expandable (MESS78, FRIT85) and M13 does not lyse 
cells. Because the M13 genome is extruded through the 
membrane and coated by a large number of identical protein 
molecules, it can be used as a cloning vector (WATS87 p278, 
and MESS77) . Thus we can insert extra genes into M13 and they 

10 will be carried along in a stable manner. 

Marvin and collaborators (MARV78, MAK080, BANN81) have 
determined an approximate 3D virion structure of fl by a 
combination of genetics, biochemistry, and X-ray diffraction 
from fibers of the virus. Figure 4 is drawn after the model 

15 of Banner et al . (BANN81) and shows only the C a s of the 

protein. The apparent holes in the cylindrical sheath are 
actually filled by protein side groups so that the DNA within 
is protected. The amino terminus of each protein monomer is 
to the outside of the cylinder, while the carboxy terminus is 

20 at smaller radius, near the DNA. Although other filamentous 
phages (e.g. Pfl or Ike) have different helical symmetry, all 
have coats composed of many short Qf-helical monomers with the 
amino terminus of each monomer on the virion surface. 

The major coat protein is encoded by gene VIII. The 50 

25 amino acid mature gene VIII coat protein is synthesized as a 
73 amino acid precoat (ITOK79) . The first 23 amino acids 
constitute a typical signal -sequence which causes the nascent 
polypeptide to be inserted into the inner cell membrane. 
Whether the precoat inserts into the membrane by itself or 

30 through the action of host secretion components, such as SecA 
and SecY, remains controversial, but has no effect on the 
operation of the present invention. 
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^ ^jl c °li signal peptidase (SP-I) recognizes amino acids 
18, 21, and 23, and, to a lesser extent, residue 22, and cuts 
between residues 23 and 24 of the precoat (KUHN85a, KUHN8 5b, 
OLIV87) . After removal of the signal sequence, the amino 
5 terminus of the mature coat is located on the periplasmic side 
of the inner membrane; the carboxy terminus is on the 
cytoplasmic side. About 3000 copies of the mature 50 amino 
acid coat protein associate side-by-side in the inner 
membrane . 

10 The sequence of gene VIII is known, and the amino acid 

sequence can be encoded on a synthetic gene, using lacUVS 
promoter and used in conjunction with the Lacl q repressor. The 
lacUVS promoter is induced by IPTG. Mature gene VIII protein 
makes up the sheath around the circular ssDNA. The 3D 

15 structure of fl virion is known at medium resolution; the 
amino terminus of gene VIII protein is on surface of the 
virion. A few modifications of gene VIII have been made and 
are discussed below. The 2D structure of M13 coat protein is 
implicit in the 3D structure. Mature M13 gene VIII protein 

2 0 has only one domain. 

When the GP is M13 the gene III and the gene VIII 
proteins are highly preferred as OSP (see Examples I through 
IV) . The proteins from genes VI, VII, and IX may also be 
used . 

25 As discussed in the Examples, we have constructed a 

tripartite gene comprising: 

1) DNA encoding a signal sequence directing secretion of 
parts (2) and (3) through the inner membrane, 

2) DNA encoding the mature BPTI sequence, and 

3 0 3) DNA encoding the mature M13 gVIII protein. 

This gene causes BPTI to appear in active form on the surface 
of Ml 3 phage. 
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The gene VIII protein is a preferred OSP because it is 
present in many copies and because its location and 
orientation in the virion are known (BANN81) . Preferably, the 
PBD is attached to the amino terminus of the mature M13 coat 
5 protein. Had direct fusion of PBD to M13 CP failed to cause 
PBD to be displayed on the surface of M13, we would have 
varied part of the mini -protein sequence and/or insert short 
random or nonrandom spacer sequences between mini -protein and 
M13 CP. The 3D model of fl indicates strongly that fusing 

10 IPBD to the amino terminus of M13 CP is more likely to yield a 
functional chimeric protein than any other fusion site. 

Similar constructions could be made with other 
filamentous phage. Pf3 is a well known filamentous phage that 
infects Pseudomonas aerugenosa cells that harbor an IncP-1 

15 plasmid. The entire genome has been sequenced (LUIT85) and 
the genetic signals involved in replication and assembly are 
known (LUIT87) . The major coat protein of PF3 is unusual in 
having no signal peptide to direct its secretion. The 
sequence has charged residues ASP 7/ ARG 37 , LYS 40 , and PHE 44 -COO~ 

2 0 which is consistent with the amino terminus being exposed. 
Thus, to cause an IPBD to appear on the surface of Pf3, we 
construct a tripartite gene comprising: 

1) a signal sequence known to cause secretion in P . 
aerugenosa (preferably known to cause secretion of IPBD) 

2 5 fused in- frame to, 

2) a gene fragment encoding the IPBD sequence, fused in- 
frame to, 

3) DNA encoding the mature Pf3 coat protein. 
Optionally, DNA encoding a flexible linker of one to 10 

3 0 amino acids is introduced between the ipbd gene fragment and 

the Pf3 coat -protein gene . Optionally, DNA encoding the 
recognition site for a specific protease, such as tissue 
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plasminogen activator or blood clotting Factor Xa, is 
introduced between the ipbd gene fragment and the Pf3 coat- 
protein gene. Amino acids that form the recognition site for 
a specific protease may also serve the function of a flexible 
5 linker. This tripartite gene is introduced into Pf3 so that 
it does not interfere with expression of any Pf3 genes. To 
reduce the possibility of genetic recombination, part (3) is 
designed to have numerous silent mutations relative to the 
wild- type gene. Once the signal sequence is cleaved off, the 
10 IPBD is in the periplasm and the mature coat protein acts as 

an anchor and phage-assembly signal. It matters not that this 
fusion protein comes to rest anchored in the lipid bilayer by 
a route different from the route followed by the wild-type 
coat protein. 

15 The amino-acid sequence of M13 pre-coat (SCHA78) , called 

AA_seql , is 

(SEQ. ID NO: 237) 
AA_seql 

1 1 2 I I 2 3 3 4 4 5 

2 0 5 0 5 0 \j>5 0 5 0 5 0 

MKKSLVLKAS VAVATLVPMLS FAAEGDDPAKAAFNS LQASATE Y I GYAWA 

5 6 6 7 7 

5 0 5 0 3 

2 5 MVWIVGATIGIKLFKKFTSKAS 

The single-letter codes for amino acids and the codes for 
ambiguous DNA are given in Table 1. The best site for 
inserting a novel protein domain into M13 CP is after A23 

30 because SP-I cleaves the precoat protein after A23, as 

indicated by the arrow. Proteins that can be secreted will 
appear connected to mature M13 CP at its amino terminus. 
Because the amino terminus of mature M13 CP is located on the 
outer surface of the virion, the introduced domain will be 

35 displayed on the outside of the virion. The uncertainty of 
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the mechanism by which M13CP appears in the lipid bilayer 
raises the possibility that direct insertion of bpti into gene 
VIII may not yield a functional fusion protein. It may be 
necessary to change the signal sequence of the fusion to, for 
5 example, the phoA signal sequence 

(MKQSTIALALLPLLFTPVTKA ) (SEQ ID NO: 127). Marks et al . 

(MARK86) showed that the phoA signal peptide could direct 
mature BPTI to the coli periplasm. 

Another vehicle for displaying the IPBD is by expressing 

10 it as a domain of a chimeric gene containing part or all of 

gene III . This gene encodes one of the minor coat proteins of 
M13 . Genes VI, VII, and IX also encode minor coat proteins. 
Each of these minor proteins is present in about 5 copies per 
virion and is related to morphogenesis or infection. In 

15 contrast, the major coat protein is present in more than 2500 
copies per virion. The gene VI, VII, and IX proteins are 
present at the ends of the virion; these three proteins are 
not post-translationally processed (RASC86) . 

The single -stranded circular phage DNA associates with 

2 0 about five copies of the gene III protein and is then extruded 

through the patch of membrane -associated coat protein in such 
a way that the DNA is encased in a helical sheath of protein 
(WEBS78) . The DNA does not base pair (that would impose 
severe restrictions on the virus genome) ; rather the bases 
25 intercalate with each other independent of sequence. 

Smith (SMIT85) and de la Cruz et al . (DELA88) have shown 
that insertions into gene III cause novel protein domains to 
appear on the virion outer surface. The mini -protein 1 s gene 
may be fused to gene III at the site used by Smith and by de 

3 0 la Cruz et al . , at a codon corresponding to another domain 

boundary or to a surface loop of the protein, or to the amino 
terminus of the mature protein. 
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All published works use a vector containing a single 
modified gene 111 of fd. Thus, all five copies of gill are 
identically modified. Gene III is quite large (1272 b.p. or 
about 2 0% of the phage genome) and it is uncertain whether a 
5 duplicate of the whole gene can be stably inserted into the 
phage. Furthermore, all five copies of gill protein are at 
one end of the virion. When bivalent target molecules (such 
as antibodies) bind a pentavalent phage, the resulting complex 
may be irreversible. Irreversible binding of the GP to the 

10 target greatly interferes with affinity enrichment of the GPs 
that carry the genetic sequences encoding the novel 
polypeptide having the highest affinity for the target. 

To reduce the likelihood of formation of irreversible 
complexes, we may use a second, synthetic gene that encodes 

15 carboxy- terminal parts of III . We might, for example, 
engineer a gene that consists of (from 5' to 3') : 

1) a promoter (preferably regulated) , 

2) a ribosome -binding site, 

3) an initiation codon, 

20 4) a functional signal peptide directing secretion of parts 

(5) and (6) through the inner membrane, 

5) DNA encoding an IPBD, 

6) DNA encoding residues 275 through 424 of M13 gill 
protein, 

25 7) a translation stop codon, and 

8) (optionally) a transcription stop signal. 
We leave the wild-type gene III so that some unaltered gene 
III protein will be present. Alternatively, we may use gene 
VIII protein as the OSP and regulate the osp : : ipbd fusion so 

3 0 that only one or a few copies of the fusion protein appear on 
the phage . 
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M13 gene VI, VII, and IX proteins are not processed after 
translation. The route by which these proteins are assembled 
into the phage have not been reported. These proteins are 
necessary for normal morphogenesis and infect ivity of the 
5 phage. Whether these molecules (gene VI protein, gene VII 

protein, and gene IX protein) attach themselves to the phage: 
a) from the cytoplasm, b) from the periplasm, or c) from 
within the lipid bilayer, is not known. One could use any of 
these proteins to introduce an IPBD onto the phage surface by 
10 one of the constructions: 

1) ipbd : : pmcp , 

2 ) pmcp : : ipbd , 

3) signal : : ipbd : : pmcp, and 

4) signal : : pmcp : : ipbd . 

15 where ipbd represents DNA coding on expression for the initial 
potential binding domain; pmcp represents DNA coding for one 
of the phage minor coat proteins, VI, VII, and IX; signal 
represents a functional secretion signal peptide, such as the 
phoA signal ( MKQ S T I ALiALL P LL FT P VTKA ) (SEQ ID NO: 127); and » : : » 

2 0 represents in- frame genetic fusion. The indicated fusions are 

placed downstream of a known promoter, preferably a regulated 
promoter such as lacUVS , tac , or trp . Fusions (1) and (2) are 
appropriate when the minor coat protein attaches to the phage 
from the cytoplasm or by autonomous insertion into the lipid 
25 bilayer. Fusion (1) is appropriate if the amino terminus of 
the minor coat protein is free and (2) is appropriate if the 
carboxy terminus is free. Fusions (3) and (4) are appropriate 
if the minor coat protein attaches to the phage from the 
periplasm or from within the lipid bilayer. Fusion (3) is 

3 0 appropriate if the amino terminus of the minor coat protein is 

free and (4) is appropriate if the carboxy terminus is free. 
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Bacteriophage 4>X174: 

The bacteriophage 3>X174 is a very small icosahedral virus 
which has been thoroughly studied by genetics, biochemistry, 
and electron microscopy (See The Single -Stranded DNA Phages 
5 (DENH78) ) . To date, no proteins from 3>X174 have been studied 
by X-ray diffraction. 3>X174 is not used as a cloning vector 
because <£X174 can accept very little additional DNA; the virus 
is so tightly constrained that several of its genes overlap. 
Chambers et al . (CHAM82) showed that mutants in gene G are 

10 rescued by the wild-type G gene carried on a plasmid so that 
the host supplies this protein. 

Three gene products of 3>X174 are present on the outside 
of the mature virion: F (capsid) , G (major spike protein, 60 
copies per virion) , and H (minor spike protein, 12 copies per 

15 virion) . The G protein comprises 175 amino acids, while H 

comprises 32 8 amino acids. The F protein interacts with the 
single- stranded DNA of the virus. The proteins F, G, and H 
are translated from a single mRNA in the viral infected cells. 
If the G protein is supplied from a plasmid in the host, then 

2 0 the viral g gene is no longer essential. We introduce one or 
more stop codons into g so that no G is produced from the 
viral gene. We fuse a pbd gene fragment to h, either at the 
3' or 5 ' terminus. We eliminate an amount of the viral g gene 
equal to the size of pbd so that the size of the genome is 

2 5 unchanged . 

Large DNA Phages 

Phage such as X or T4 have much larger genomes than do 
M13 or <i>X174. Large genomes are less conveniently manipulated 
than small genomes. Phage X has such a large genome that 

3 0 cassette mutagenesis is not practicable. One can not use 

annealing of a mutagenic oligonucleotide either, because there 
is no ready supply of single -stranded X DNA. (X DNA is 
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packaged as double -stranded DNA.) Phage such as X and T4 have 
more complicated 3D capsid structures than M13 or <i>X174, with 
more OSPs to choose from. Intracellular morphogenesis of 
phage X could cause protein domains that contain disulfide 
5 bonds in their folded forms not to fold. 

Phage X virions and phage T4 virions form 
intracellularly , so that IPBDs requiring large or insoluble 
prosthetic groups might fold on the surfaces of these phage. 
RNA Phages 

10 RNA phage are not preferred because manipulation of RNA 

is much less convenient than is the manipulation of DNA. If 
the RNA phage MS2 were modified to make room for an osp-ipbd 
gene and if a message containing the A protein binding site 
and the gene for a chimera of coat protein and a PBD were 

15 produced in a cell that also contained A protein and wild-type 
coat protein (both produced from regulated genes on a 
plasmid) , then the RNA coding for the chimeric protein would 
get packaged. A package comprising RNA encapsulated by 
proteins encoded by that RNA satisfies the major criterion 

20 that the genetic message inside the package specifies 

something on the outside. The particles by themselves are not 
viable unless the modified A protein is functional. After 
isolating the packages that carry an SBD, we would need to: 
1) separate the RNA from the protein capsid; 2) reverse 

2 5 transcribe the RNA into DNA, using AMV or MMTV reverse 

transcriptase, and 3) use Thermus aquaticus DNA polymerase for 
25 or more cycles of Polymerase Chain Reaction™ to amplify 
the osp-sbd DNA until there is enough to subclone the 
recovered genetic message into a plasmid for sequencing and 

3 0 further work. 

Alternatively, helper phage could be used to rescue the 
isolated phage. In one of these ways we can recover a 
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sequence that codes for an SBD having desirable binding 
properties . 

IV. C. Bacterial Cells as Genetic Packages: 

One may choose any well -characterized bacterial strain 
5 which (1) may be grown in culture (2) may be engineered to 
display PBDs on its surface, and (3) is compatible with 
affinity selection . 

Among bacterial cells, the preferred genetic packages are 
Salmonella typhimur ium , Bacillus subtilis , Pseudomonas 

10 aeruginosa, Vibrio cholerae , Klebsiella pneumonia, Neisseria 
gonorrhoeae , Neisseria meningitidis , Bacteroides nodosus , 
Moraxella bovis , and especially Escherichia coli. The 
potential binding mini -protein may be expressed as an insert 
in a chimeric bacterial outer surface protein (OSP) . All 

15 bacteria exhibit proteins on their outer surfaces. Works on 
the localization of OSPs and the methods of determining their 
structure include: CALA90, HEIJ90, EHRM90, BENZ88a, BENZ88b, 
MAN088, BAKE87, RAND87, HANC8 7, HENR87, NAKA8 6b, MANO86, 
SILH85, TOMM85, NIKA84, LUGT83, and BECK83 . 

20 In E_^ coli , LamB is a preferred OSP. As discussed below, 

there are a number of very good alternatives in E^ coli and 
there are very good alternatives in other bacterial species. 
There are also methods for determining the topology of OSPs so 
that it is possible to systematically determine where to 

25 insert an ipbd into an osp gene to obtain display of an IPBD 
on the surface of any bacterial species. 

In view of the extensive knowledge of E_;_ coli , a strain 
of E^ coli , defective in recombination, is the strongest 
candidate as a bacterial GP. 

3 0 Oliver has reviewed mechanisms of protein secretion in 

bacteria (OLIV85a and OLIV87) . Nikaido and Vaara (NIKA87) , 
Benz (BENZ8 8b) , and Baker et al , (BAKE87) have reviewed 
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mechanisms by which proteins become localized to the outer 
membrane of gram-negative bacteria. While most bacterial 
proteins remain in the cytoplasm, others are transported to 
the periplasmic space (which lies between the plasma membrane 
5 and the cell wall of gram-negative bacteria) , or are conveyed 
and anchored to the outer surface of the cell. Still others 
are exported (secreted) into the medium surrounding the cell. 
Those characteristics of a protein that are recognized by a 
cell and that cause it to be transported out of the cytoplasm 

10 and displayed on the cell surface will be termed "outer- 
surface transport signals". 

Gram-negative bacteria have outer-membrane proteins 
(OMP) , that form a subset of OSPs . Many OMPs span the 
membrane one or more times. The signals that cause OMPs to 

15 localize in the outer membrane are encoded in the amino acid 
sequence of the mature protein. Outer membrane proteins of 
bacteria are initially expressed in a precursor form including 
a so- called signal peptide. The precursor protein is 
transported to the inner membrane, and the signal peptide 

20 moiety is extruded into the periplasmic space. There, it is 
cleaved off by a "signal peptidase", and the remaining 
"mature" protein can now enter the periplasm. Once there, 
other cellular mechanisms recognize structures in the mature 
protein which indicate that its proper place is on the outer 

25 membrane, and transport it to that location. 

It is well known that the DNA coding for the leader or 
signal peptide from one protein may be attached to the DNA 
sequence coding for another protein, protein X, to form a 
chimeric gene whose expression causes protein X to appear free 

3 0 in the periplasm (BECK83, INOU8 6 ChlO, LEEC8 6, MARK8 6, and 

BOQU87) . That is, the leader causes the chimeric protein to 
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be secreted through the lipid bilayer; once in the periplasm, 
it is cleaved off by the signal peptidase SP-I. 

The use of export -permissive bacterial strains (LISS85, 
STAD89) increases the probability that a signal - sequence- 
5 fusion will direct the desired protein to the cell surface. 
Liss et al . (LISS85) showed that the mutation prlA4 makes E . 
coli more permissive with respect to signal sequences. 
Similarly, Stader et al . (STAD89) found a strain that bears a 
prlG mutation and that permits export of a protein that is 

10 blocked from export in wild-type cells. Such export- 
permissive strains are preferred. 

OSP-IPBD fusion proteins need not fill a structural role 
in the outer membranes of Gram-negative bacteria because parts 
of the outer membranes are not highly ordered. For large OSPs 

15 there is likely to be one or more sites at which osp can be 
truncated and fused to ipbd such that cells expressing the 
fusion will display IPBDs on the cell surface. Fusions of 
fragments of omp genes with fragments of an x gene have led to 
X appearing on the outer membrane (CHAR88b, BENS 8 4 , CLEM81) . 

2 0 When such fusions have been made, we can design an osp -ipbd 
gene by substituting ipbd for x in the DNA sequence. 
Otherwise, a successful OMP- IPBD fusion is preferably sought 
by fusing fragments of the best omp to an ipbd , expressing the 
fused gene, and testing the resultant GPs for display-of -IPBD 

25 phenotype . We use the available data about the OMP to pick 

the point or points of fusion between omp and ipbd to maximize 
the likelihood that IPBD will be displayed. (Spacer DNA 
encoding flexible linkers, made, e.g. , of GLY, SER, and ASN, 
may be placed between the osp - and ipbd -derived fragments to 

30 facilitate display.) Alternatively, we truncate osp at 

several sites or in a manner that produces osp fragments of 
variable length and fuse the osp fragments to ipbd ; cells 
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expressing the fusion are screened or selected which display 
IPBDs on the cell surface. Freudl et al . (FREU89) have shown 
that fragments of OSPs (such as OmpA) above a certain size are 
incorporated into the outer membrane. An additional 
5 alternative is to include short segments of random DNA in the 
fusion of omp fragments to ipbd and then screen or select the 
resulting variegated population for members exhibiting the 
display-of -IPBD phenotype . 

In coli , the LamB protein is a well understood OSP and 

10 can be used (BENS84, CHAR90, RONC90, V AND 90 , CHAP90, MOLL90, 
CHAR88b, CHAR88C, CLEM81, DARG88, FERE82a , FERE82b, FERE83, 
FERE84 , FERE86a, * FERE8 6b, FERE89a, FERE8 9b, GEHR8 7 , HALL82, 
NAKA86a , STAD8 6 , HEIN88, BENS 8 7b, BENS8 7c , BOUG84 , BOUL86a, 
CHAR84) . The E^ coli LamB has been expressed in functional 

15 form in typhimurium (DEVR84, BARB 8 5 , HARK87) , cholerae 

(HARK86) , and pneumonia (DEVR84 , WEHM8 9) , so that one could 
display a population of PBDs in any of these species as a 
fusion to E^_ coli LamB. pneumonia expresses a maltoporin 

similar to LamB (WEHM89) which could also be used. In P^ 

2 0 aeruginosa , the Dl protein (a homologue of LamB) can be used 

(TRIA8 8) . 

LamB of E^ coli is a porin for maltose and malto dextrin 
transport, and serves as the receptor for adsorption of 
bacteriophages X and K10. LamB is transported to the outer 
25 membrane if a functional N- terminal sequence is present; 

further, the first 49 amino acids of the mature sequence are 
required for successful transport (BENS84) . As with other 
OSPs, LamB of E_;_ coli is synthesized with a typical signal- 
sequence which is subsequently removed. Homology between 

3 0 parts of LamB protein and other outer membrane proteins OmpC, 

OmpF, and PhoE has been detected (NIKA84) , including homology 
between LamB amino acids 3 9-4 9 and sequences of the other 



121 

proteins. These subsequences may label the proteins for 
transport to the outer membrane. 

The amino acid sequence of LamB is known (CLEM81) , and a 
model has been developed of how it anchors itself to the outer 
5 membrane (Reviewed by, among others, BENZ88b) . The location 
of its maltose and phage binding domains are also known 
(HEIN88) , Using this information, one may identify several 
strategies by which a PBD insert may be incorporated into LamB 
to provide a chimeric OSP which displays the PBD on the 

10 bacterial outer membrane. 

When the PBDs are to be displayed by a chimeric 
transmembrane protein like LamB, the PBD could be inserted 
into a loop normally found on the surface of the cell ( cp . 
BECK83, MAN086) . Alternatively, we may fuse a 5 ! segment of 

15 the osp gene to the ipbd gene fragment; the point of fusion is 
picked to correspond to a surface-exposed loop of the OSP and 
the carboxy terminal portions of the OSP are omitted. In 
LamB, it has been found that up to 60 amino acids may be 
inserted (CHAR88b) with display of the foreign epitope 

2 0 resulting; the structural features of OmpC, OmpA, OmpF, and 
PhoE are so similar that one expects similar behavior from 
these proteins. 

It should be noted that while LamB may be characterized 
as a binding protein, it is used in the present invention to 

2 5 provide an OSTS; its binding domains are not variegated. 

Other bacterial outer surface proteins, such as OmpA, 
OmpC, OmpF, PhoE, and pilin, may be used in place of LamB and 
its homologues. OmpA is of particular interest because it is 
very abundant and because homologues are known in a wide 

30 variety of gram-negative bacterial species. Baker et al . 

(BAKE8 7) review assembly of proteins into the outer membrane 
of E_^ coli and cite a topological model of OmpA (VOGE86) that 
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predicts that residues 19-32, 62-73, 105-118, and 147- 158 are 
exposed on the cell surface. Insertion of a ipbd encoding 
fragment at about codon 111 or at about codon 152 is likely to 
cause the IPBD to be displayed on the cell surface. 
5 Concerning OmpA, see also MACI88 and MAN088. Porin Protein F 
of Pseudomonas aeruginosa has been cloned and has sequence 
homology to OmpA of coli (DUCH88) . Although this homology 
is not sufficient to allow prediction of surface-exposed 
residues on Porin Protein F, the methods used to determine the 
10 topological model of OmpA may be applied to Porin Protein F. 
Works related to use of OmpA as an OSP include BECK8 0 and 
MACI88 . 

Misra and Benson (MISR88a, MISR88b) disclose a 
topological model of coli OmpC that predicts that, among 

15 others, residues GLYi 64 and LEU250 are exposed on the cell 

surface. Thus insertion of an ipbd gene fragment at about 
codon 164 or at about codon 2 50 of the E_^ coli ompC gene or at 
corresponding codons of the typhimurium ompC gene is likely 
to cause IPBD to appear on the cell surface. The ompC genes 

2 0 of other bacterial species may be used. Other works related 
to OmpC include CATR87 and CLIC88. 

OmpF of E^ coli is a very abundant OSP, slO 4 copies/cell. 
Pages et al . (PAGE90) have published a model of OmpF 
indicating seven surface-exposed segments. Fusion of an ipbd 

25 gene fragment, either as an insert or to replace the 3 1 part 
of ompF , in one of the indicated regions is likely to produce 
a functional ompF : : ipbd gene the expression of which leads to 
display of IPBD on the cell surface. In particular, fusion at 
about codon 111, 177, 217, or 245 should lead to a functional 

30 ompF : : ipbd gene. Concerning OmpF, see also REID88b, PAGE 8 8 , 
BENS 8 8 , TOMM82, and SODE85. 
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Pilus proteins are of particular interest because 
piliated cells express many copies of these proteins and 
because several species (N^ gonorrhoeae , P . aeruginosa , 
Moraxella bovis, Bacteroides nodosus , and coli ) express 
5 related pilins. Getzoff and coworkers (GETZ88 , PARG87, 

SOME85) have constructed a model of the gonococcal pilus that 
predicts that the protein forms a four-helix bundle having 
structural similarities to tobacco mosaic virus protein and 
myohemerythrin. On this model, both the amino and carboxy 

10 termini of the protein are exposed. The amino terminus is 
methylated. Elleman (ELLE88) has reviewed pilins of 
Bacteroides nodosus and other species and serotype differences 
can be related to differences in the pilin protein and that 
most variation occurs in the C- terminal region. The amino- 

15 terminal portions of the pilin protein are highly conserved. 
Jennings et al . (JENN8 9) have grafted a fragment of foot-and- 
mouth disease virus (residues 144-159) into the B^ nodosus 
type 4 fimbrial protein which is highly homologous to 
gonococcal pilin. They found that expression of the 3 ! - 

20 terminal fusion in P_;_ aeruginosa led to a viable strain that 
makes detectable amounts of the fusion protein. Jennings et 
al . did not vary the foreign epitope nor did they suggest any 
variation. They inserted a GLY-GLY linker between the last 
pilin residue and the first residue of the foreign epitope to 

25 provide a "flexible linker". Thus a preferred place to attach 
an IPBD is the carboxy terminus. The exposed loops of the 
bundle could also be used, although the particular internal 
fusions tested by Jennings et al . (JENN8 9) appeared to be 
lethal in P^ aeruginosa . Concerning pilin, see also MCKE85 

3 0 and ORND85. 

Judd (JUDD86, JUDD85) has investigated Protein IA of N. 
gonorrhoeae and found that the amino terminus is exposed; 
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thus, one could attach an IPBD at or near the amino terminus 
of the mature P.IA as a means to display the IPBD on the N. 
gonorrhoeae surface. 

A model of the topology of PhoE of coli has been 
5 disclosed by van der Ley et al . (VAND86) . This model predicts 
eight loops that are exposed; insertion of an IPBD into one of 
these loops is likely to lead to display of the IPBD on the 
surface of the cell. Residues 158, 201, 238, and 275 are 
preferred locations for insertion of and IPBD. 

10 Other OSPs that could be used include coli BtuB, FepA, 

FhuA, IutA, FecA, and FhuE (GUDM8 9) which are receptors for 
nutrients usually found in low abundance. The genes of all 
these proteins have been sequenced, but topological models are 
not yet available. Gudmunsdottir et al . (GUDM89) have begun 

15 the construction of such a model for BtuB and FepA by showing 
that certain residues of BtuB face the peri plasm and by 
determining the functionality of various BtuB :: FepA fusions. 
Carmel et al . (CARM90) have reported work of a similar nature 
for FhuA. All Neisseria species express outer surface 

2 0 proteins for iron transport that have been identified and, in 
many cases, cloned. See also MORS87 and MORS88. 

Many gram-negative bacteria express one or more 
phospholipases . E^ coli phospholipase A, product of the pldA 
gene, has been cloned and sequenced by de Geus et al . 

2 5 (DEGE84) . They found that the protein appears at the cell 

surface without any posttranslational processing. A ipbd gene 
fragment can be attached at either terminus or inserted at 
positions predicted to encode loops in the protein. That 
phospholipase A arrives on the outer surface without removal 

30 of a signal sequence does not prove that a PldA:: IPBD fusion 
protein will also follow this route. Thus we might cause a 
PldA::IPBD or IPBD::PldA fusion to be secreted into the 
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periplasm by addition of an appropriate signal sequence. 
Thus, in addition to simple binary fusion of an ipbd fragment 
to one terminus of pldA , the constructions: 
1) ss: : ipbd : : pldA 
5 2) ss: : pldA : : ipbd 

should be tested. Once the PldA:: IPBD protein is free in the 
periplasm it does not remember how it got there and the 
structural features of PldA that cause it to localize on the 
outer surface will direct the fusion to the same destination. 

10 IV. D . Bacterial Spores as Genetic Packages: 

Bacterial spores have desirable properties as GP 
candidates. Spores are much more resistant than vegetative 
bacterial cells or phage to chemical and physical agents, and 
hence permit the use of a great variety of affinity selection 

15 conditions. Also, Bacillus spores neither actively metabolize 
nor alter the proteins on their surface. Spores have the 
disadvantage that the molecular mechanisms that trigger 
sporulation are less well worked out than is the formation of 
M13 or the export of protein to the outer membrane of coli . 

2 0 Bacteria of the genus Bacillus form endospores that are 

extremely resistant to damage by heat, radiation, desiccation, 
and toxic chemicals (reviewed by Losick et al . (LOSI86) ) . 
This phenomenon is attributed to extensive intermolecular 
crosslinking of the coat proteins. Endospores from the genus 

2 5 Bacillus are more stable than are exospores from Streptomyces . 

Bacillus subtilis forms spores in 4 to 6 hours, but 
Streptomyces species may require days or weeks to sporulate. 
In addition, genetic knowledge and manipulation is much more 
developed for B^ subtilis than for other spore -forming 

3 0 bacteria. Thus Bacillus spores are preferred over 

Streptomyces spores. Bacteria of the genus Clostridium also 
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form very durable endospores , but Clostridia, being strict 
anaerobes, are not convenient to culture. 

Viable spores that differ only slightly from wild- type 
are produced in subtil is even if any one of four coat 
5 proteins is missing (DON087) . Moreover, plasmid DNA is 

commonly included in spores, and plasmid encoded proteins have 
been observed on the surface of Bacillus spores (DEBR86) . For 
these reasons, we expect that it will be possible to express 
during sporulation a gene encoding a chimeric coat protein, 

10 without interfering materially with spore formation. 

Donovan et al . have identified several polypeptide 
components of subtil is spore coat (D0N087) ; the sequences 
of two complete coat proteins and amino- terminal fragments of 
two others have been determined. Some, but not all, of the 

15 coat proteins are synthesized as precursors and are then 

processed by specific proteases before deposition in the spore 
coat (DON08 7) . The 12kd coat protein, CotD, contains 5 
cysteines. CotD also contains an unusually high number of 
histidines (16) and prolines (7) . The llkd coat protein, 

20 CotC, contains only one cysteine and one methionine. CotC has 
a very unusual amino-acid sequence with 19 lysines (K) 
appearing as 9 K-K dipeptides and one isolated K. There are 
also 20 tyro sines (Y) of which 10 appear as 5 Y-Y dipeptides. 
Peptides rich in Y and K are known to become crosslinked in 

25 oxidizing environments (DEV078, WAIT83, WAIT85, WAIT86) . CotC 
contains 16 D and E amino acids that nearly equals the 19 Ks . 
There are no A, F, R, I, L, N, P, Q, S, or W amino acids in 
CotC. Neither CotC nor CotD is post-translationally cleaved, 
but the proteins CotA and CotB are. 

3 0 Since, in B^ subtilis , some of the spore coat proteins 

are post-translationally processed by specific proteases, it 
is valuable to know the sequences of precursors and mature 
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coat proteins so that we can avoid incorporating the 
recognition sequence of the specific protease into our 
construction of an OSP-IPBD fusion. The sequence of a mature 
spore coat protein contains information that causes the 
5 protein to be deposited in the spore coat; thus gene fusions 
that include some or all of a mature coat protein sequence are 
preferred for screening or selection for the display-of -IPBD 
phenotype . 

Fusions of ipbd fragments to cotC or cotD fragments are 

10 likely to cause IPBD to appear on the spore surface. The 

genes cotC and cotD are preferred osp genes because CotC and 
CotD are not post- translationally cleaved. Subsequences from 
cotA or cotB could also be used to cause an IPBD to appear on 
f the surface of B^ subtil is spores, but we must take the post- 

15 translational cleavage of these proteins into account. DNA 

encoding IPBD could be fused to a fragment of cotA or cotB at 
either end of the coding region or at sites interior to the 
coding region. Spores could then be screened or selected for 
the display-of -IPBD phenotype. 

20 The promoter of a spore coat protein is most active: a) 

when spore coat protein is being synthesized and deposited 
onto the spore and b) in the specific place that spore coat 
proteins are being made. The sequences of several sporulation 
promoters are known; coding sequences operatively linked to 

25 such promoters are expressed only during sporulation. Ray et 
al . (RAYC87) have shown that the G4 promoter of B^ subtilis is 
directly controlled by RNA polymerase bound to a E . To date, no 
Bacillus sporulation promoter has been shown to be inducible 
by an exogenous chemical inducer as the lac promoter of 

3 0 coli . Nevertheless, the quantity of protein produced from a 
sporulation promoter can be controlled by other factors, such 
as the DNA sequence around the Shine-Dalgarno sequence or 

I" 

|; 
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codon usage. Chemically inducible sporulation promoters can 
be developed if necessary. 

IV. E. Artificial OSPs 

It is generally preferable to use as the genetic package 
5 a cell, spore or virus for which an outer surface protein 
which can be engineered to display a IPBD has already been 
identified. However, the present invention is not limited to 
such genetic packages. 

It is believed that the conditions for an outer surface 

10 transport signal in a bacterial cell or spore are not 
particularly stringent, i.e. , a random polypeptide of 
appropriate length (preferably 30-100 amino acids) has a 
reasonable chance of providing such a signal. Thus, by 
constructing a chimeric gene comprising a segment encoding the 

15 IPBD linked to a segment of random or pseudorandom DNA (the 
potential OSTS) , and placing this gene under control of a 
suitable promoter, there is a possibility that the chimeric 
protein so encoded will function as an OSP- IPBD. 

This possibility is greatly enhanced by constructing 

20 numerous such genes, each having a different potential OSTS, 
cloning them into a suitable host, and selecting for 
transf ormants bearing the IPBD (or other marker) on their 
outer surface. Use of secretion-permissive mutants, such as 
prlA4 (LISS85) or prlG (STAD89) , can increase the probability 

25 of obtaining a working OSP- IPBD. 

When seeking to display a IPBD on the surface of a 
bacterial cell, as an alternative to choosing a natural OSP 
and an insertion site in the OSP, we can construct a gene (the 
"display probe") comprising: a) a regulatable promoter ( e.g. 

30 lacUVS) , b) a Shine- Dalgarno sequence, c) a periplasmic 

transport signal sequence, d) a fusion of the ipbd gene with a 
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segment of random DNA (as in Kaiser et al . (KAIS87) ) , e) a 
stop codon, and f) a transcriptional terminator. 

When the genetic package is a spore, we can use the 
approach described above for attaching a IPBD to an coli 
5 cell, except that: a) a sporulation promoter is used, and b) 
no periplasmic signal sequence should be present. 

For phage, because the OSP-IPBD fulfills a structural 
role in the phage coat, it is unlikely that any particular 
random DNA sequence coupled to the ipbd gene will produce a 

10 fusion protein that fits into the coat in a functional way. 

Nevertheless, random DNA inserted between large fragments of a 
coat protein gene and the pbd gene will produce a population 
that is likely to contain one or more members that display the 
IPBD on the outside of a viable phage. 

15 As previously stated, the purpose of the random DNA is to 

encode an OSTS, like that embodied in known OSPs. The fusion 
°f ipbd and the random DNA could be in either order, but ipbd 
upstream is slightly preferred. Isolates from the population 
generated in this way can be screened for display of the IPBD. 

2 0 Preferably, a version of selection- through-binding is used to 

select GPs that display IPBD on the GP surface. 

Alternatively, clonal isolates of GPs may be screened for the 
display-of -IPBD phenotype . 

The preference for ipbd upstream of the random DNA arises 
25 from consideration of the manner in which the successful 

GP(IPBD) will be used. The present invention contemplates 
introducing numerous mutations into the pbd region of the osp- 
pbd gene, which, depending on the variegation scheme, might 
include gratuitous stop codons. If pbd precedes the random 

3 0 DNA, then gratuitous stop codons in pbd lead to no OSP- PBD 

protein appearing on the cell surface. If pbd follows the 
random DNA, then gratuitous stop codons in pbd might lead to 
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incomplete OSP-PBD proteins appearing on the cell surface. 
Incomplete proteins often are non-specif ically sticky so that 
GPs displaying incomplete PBDs are easily removed from the 
population . 

5 The random DNA may be obtained in a variety of ways . 

Degenerate synthetic DNA is one possibility. Alternatively, 
pseudorandom DNA can be generated from any DNA having high 
sequence diversity, e.g. , the genome of the organism, by 
partially digesting with an enzyme that cuts very often, e.g. , 

10 Sau3A I . Alternatively, one could shear DNA having high 
sequence diversity, blunt the sheared DNA with the large 
fragment of coli DNA polymerase I (hereinafter referred to 
as Klenow fragment) , and clone the sheared and blunted DNA 
into blunt sites of the vector (MANI82, p295, AUSU87) . 

15 If random DNA and phenotypic selection or screening are 

used to obtain a GP(IPBD), then we clone random DNA into one 
of the restriction sites that was designed into the display 
probe. A plasmid carrying the display probe is digested with 
the appropriate restriction enzyme and the fragmented, random 

20 DNA is annealed and ligated by standard methods. The ligated 
plasmids are used to transform cells that are grown and 
selected for expression of the antibiotic-resistance gene. 
Plasmid-bearing GPs are then selected for the display-of -IPBD 
phenotype by the affinity selection methods described 

25 hereafter, using Af M (IPBD) as if it were the target. 

As an alternative to selecting GP(IPBD)s through binding 
to an affinity column, we can isolate colonies or plaques and 
screen for successful artificial OSPs through use of one of 
the methods listed below for verification of the display 

3 0 strategy. 
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IV. F Designing the osp-ipbd gene insert: 

Genetic Construction and Expression Considerations 

The (i) pbci-osp gene may be: a) completely synthetic, b) 
a composite of natural and synthetic DNA, or c) a composite of 
5 natural DNA fragments. The important point is that the pbd 
segment be easily variegated so as to encode a multitudinous 
and diverse family of PBDs as previously described. A 
synthetic ipbd segment is preferred because it allows greatest 
control over placement of restriction sites. Primers 

10 complementary to regions abutting the osp-ipbd gene on its 3 1 
flank and to parts of the osp-ipbd gene that are not to be 
varied are needed for sequencing. 

The sequences of regulatory parts of the gene are taken 
from the sequences of natural regulatory elements: a) 

15 promoters, b) Shine-Dalgarno sequences, and c) transcriptional 
terminators. Regulatory elements could also be designed from 
knowledge of consensus sequences of natural regulatory 
regions. The sequences of these regulatory elements are 
connected to the coding regions; restriction sites are also 

20 inserted in or adjacent to the regulatory regions to allow 
convenient manipulation . 

The essential function of the affinity separation is to 
separate GPs that bear PBDs (derived from IPBD) having high 
affinity for the target from GPs bearing PBDs having low 

25 affinity for the target. If the elution volume of a GP 

depends on the number of PBDs on the GP surface, then a GP 
bearing many PBDs with low affinity, GP(PBD W ), might co-elute 
with a GP bearing fewer PBDs with high affinity, GP(PBD S ) . 
Regulation of the osp-pbd gene preferably is such that most 

30 packages display sufficient PBD to effect a good separation 
according to affinity. Use of a regulatable promoter to 
control the level of expression of the osp-pbd allows fine 
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adjustment of the chromatographic behavior of the variegated 
population . 

Induction of synthesis of engineered genes in vegetative 
bacterial cells has been exercised through the use of 
5 regulated promoters such as lacUVS , trpP , or tac (MANI82) . 

The factors that regulate the quantity of protein synthesized 
include: a) promoter strength ( cf . HOOP87) , b) rate of 
initiation of translation ( cf . GOLD87) , c) codon usage, d) 
secondary structure of mRNA, including attenuators ( cf . 

10 LAND87) and terminators ( cf . YAGE87) , e) interaction of 
proteins with mRNA ( cf . MCPH86, MILL87b / WINT87) , f) 
degradation rates of mRNA ( cf . BRAW87, KING86) , g) proteolysis 
( cf . GOTT87) . These factors are sufficiently well understood 
that a wide variety of heterologous proteins can now be 

15 produced in E_;_ coli , B . subtilis and other host cells in at 
least moderate quantities (SKER88, BETT88) . Preferably, the 
promoter for the osp-ipbd gene is subject to regulation by a 
small chemical inducer. For example, the lac promoter and the 
hybrid trp - lac ( tac ) promoter are regulatable with isopropyl 

20 thiogalactoside (IPTG) . Hereinafter, we use "XINDUCE" as a 

generic term for a chemical that induces expression of a gene. 
The promoter for the constructed gene need not come from a 
natural osp gene; any regulatable bacterial promoter can be 
used. 

25 Transcriptional regulation of gene expression is best 

understood and most effective, so we focus our attention on 
the promoter. If transcription of the osp-ipbd gene is 
controlled by the chemical XINDUCE, then the number of OSP- 
IPBDs per GP increases for increasing concentrations of 

3 0 XINDUCE until a fall -off in the number of viable packages is 
observed or until sufficient IPBD is observed on the surface 
of harvested GP(IPBD)s. The attributes that affect the 
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maximum number of OSP-IPBDs per GP are primarily structural in 
nature . There may be steric hindrance or other unwanted 
interactions between IPBDs if OSP-IPBD is substituted for 
every wild-type OSP. Excessive levels of OSP-IPBD may also 
5 adversely affect the solubility or morphogenesis of the GP. 

For cellular and viral GPs, as few as five copies of a protein 
having affinity for another immobilized molecule have resulted 
in successful affinity separations (FERE82a / FERE82b, and 
SMIT85) . 

10 A non-leaky promoter is preferred. Non-leakiness is 

useful: a) to show that affinity of GP ( osp-ipbd ) s for 
AfM(IPBD) is due to the osp-ipbd gene, and b) to allow growth 
of GP ( osp-ipbd ) in the absence of XINDUCE if the expression of 
osp-ipbd is disadvantageous. The lacUVS promoter in 

15 conjunction with the Lacl q repressor is a preferred example. 

An exemplary osp-ipbd gene has the DNA sequence shown in 
Table 25 and there annotated to explain the useful restriction 
sites and biologically important features, viz . the lacUVS 
promoter, the lacO operator, the Shine-Dalgarno sequence, the 

2 0 amino acid sequence, the stop codons, and the trp attenuator 

transcriptional terminator . 

The present invention is not limited to a single method 
of gene design. The osp-ipbd gene need not be synthesized in 
toto; parts of the gene may be obtained from nature. One may 
25 use any genetic engineering method to produce the correct gene 
fusion, so long as one can easily and accurately direct 
mutations to specific sites in the pbd DNA subsequence. In 
all of the methods of mutagenesis considered in the present 
invention, however, it is necessary that the coding sequence 

3 0 for the osp-ipbd gene be different from any other DNA in the 

OCV. The degree and nature of difference needed is determined 
by the method of mutagenesis to be used. If the method of 
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mutagenesis is to be replacement of subsequences coding for 
the PBD with vgDNA, then the subsequences to be mutagenized 
are preferably bounded by restriction sites that are unique 
with respect to the rest of the OCV. Use of non-unique sites 
5 involves partial digestion which is less efficient than 

complete digestion of a unique site and is not preferred. If 
single-stranded-oligonucleotide- directed mutagenesis is to be 
used, then the DNA sequence of the subsequence coding for the 
IPBD must be unique with respect to the rest of the OCV. 

10 The coding portions of genes to be synthesized are 

designed at the protein level and then encoded in DNA. The 
amino acid sequences are chosen to achieve various goals, 
including: a) display of a IPBD on the surface of a GP, b) 
change of charge on a IPBD, and c) generation of a population 

15 of PBDs from which to select an SBD. These issues are discuss 
in more detail below. The ambiguity in the genetic code is 
exploited to allow optimal placement of restriction sites and 
to create various distributions of amino acids at variegated 
codons . 

2 0 While the invention does not require any particular 

number or placement of restriction sites, it is generally 
preferable to engineer restriction sites into the gene to 
facilitate subsequent manipulations. Preferably, the gene 
provides a series of fairly uniformly spaced unique 
25 restriction sites with no more than a preset maximum number of 
bases, for example 100, between sites. Preferably, the gene 
is designed so that its insertion into the OCV does not 
destroy the uniqueness of unique restriction sites of the OCV. 
Preferred recognition sites are those for restriction enzymes 

3 0 which a) generate cohesive ends, b) have unambiguous 

recognition, or c) have higher specific activity. 



135 

The ambiguity of the DNA between the restriction sites is 
resolved from the following considerations. If the given 
amino acid sequence occurs in the recipient organism, and if 
the DNA sequence of the gene in the organism is known, then, 
5 preferably, we maximize the differences between the engineered 
and natural genes to minimize the potential for recombination. 
In addition, the following codons are poorly translated in E , 
coli and, therefore, are avoided if possible: cta(L), cga 
(R) , egg (R) , and agg (R) . For other host species, different 
10 codon restrictions would be appropriate. Finally, long 

repeats of any one base are prone to mutation and thus are 
avoided- Balancing these considerations, we can design a DNA 
sequence . 

Structural Considerations 

15 The design of the amino-acid sequence for the ipbd - osp 

gene to encode involves a number of structural considerations. 
The design is somewhat different for each type of GP. In 
bacteria, OSPs are not essential, so there is no requirement 
that the OSP domain of a fusion have any of its parental 

20 functions beyond lodging in the outer membrane. 
Relationship between PBD and OSP 

It is not required that the PBD and OSP domains have any 
particular spatial relationship; hence the process of this 
invention does not require use of the method of US Patent 

25 '692. 

It is, in fact, desirable that the OSP not constrain the 
orientation of the PBD domain; this is not to be confused with 
lack of constraint within the PBD. Cwirla et al . (CWIR90) , 
Scott and Smith (SCOT90) , and Devlin et al . (DEVL90) , have 
3 0 taught that variable residues in phage -displayed random 

peptides should be free of influence from the phage OSP. We 
teach that binding domains having a moderate to high degree of 
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conformational constraint will exhibit higher specificity and 
that higher affinity is also possible. Thus, we prescribe 
picking codons for variegation that specify amino acids that 
will appear in a well-defined framework. The nature of the 
5 side groups is varied through a very wide range due to the 
combinatorial replacement of multiple amino acids. The main 
chain conformations of most PBDs of a given class is very 
similar. The movement of the PBD relative to the OSP should 
not, however, be restricted. Thus it is often appropriate to 

10 include a flexible linker between the PBD and the OSP. Such 
flexible linkers can be taken from naturally occurring 
proteins known to have flexible regions. For example, the 
gill protein of M13 contains glycine-rich regions thought to 
allow the amino- terminal domains a high degree of freedom. 

15 Such flexible linkers may also be designed. Segments of 

polypeptides that are rich in the amino acids GLY, ASN, SER, 
and ASP are likely to give rise to flexibility. Multiple 
glycines are particularly preferred. 
Constraints imposed by OSP 

2 0 When we choose to insert the PBD into a surface loop of 

an OSP such as LamB, OmpA, or M13 gill protein, there are a 
few considerations that do not arise when PBD is joined to the 
end of an OSP. In these cases, the OSP exerts some 
constraining influence on the PBD; the ends of the PBD are 
25 held in more or less fixed positions. We could insert a 

highly varied DNA sequence into the osp gene at codons that 
encode a surf ace -exposed loop and select for cells that have a 
specific-binding phenotype . When the identified amino- acid 
sequence is synthesized (by any means) , the con straint of the 

3 0 OSP is lost and the peptide is likely to have a much lower 

affinity for the target and a much lower specificity. Tan and 
Kaiser (TANN77) found that a synthetic model of BPTI 
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containing all the amino acids of BPTI that contact trypsin 
has a K<a for trypsin «10 7 higher than BPTI. Thus, it is 
strongly preferred that the varied amino acids be part of a 
PBD in which the structural constrains are supplied by the 
PBD. 

It is known that the amino acids adjoining foreign 
epitopes inserted into LamB influence the immunological 
properties of these epitopes (VAND90) . We expect that PBDs 
inserted into loops of LamB, OmpA, or similar OSPs will be 
influenced by the amino acids of the loop and by the OSP in 
general. To obtain appropriate display of the PBD, it may be 
necessary to add one or more linker amino acids between the 
OSP and the PBD. Such linkers may be taken from natural 
proteins or designed on the basis of our knowledge of the 
structural behavior of amino acids. Sequences rich in GLY, 
SER, ASN, ASP, ARG, and THR are appropriate. One to five 
amino acids at either junction are likely to impart the 
desired degree of flexibility between the OSP and the PBD. 
Phage OSP 

A preferred site for insertion of the ipbd gene into the 
phage osp gene is one in which: a) the IPBD folds into its 
original shape, b) the OSP domains fold into their original 
shapes, and c) there is no interference between the two 
domains . 

If there is a model of the phage that indicates that 
either the amino or carboxy terminus of an OSP is exposed to 
solvent, then the exposed terminus of that mature OSP becomes 
the prime candidate for insertion of the ipbd gene. A low 
resolution 3D model suffices. 

In the absence of a 3D structure, the amino and carboxy 
termini of the mature OSP are the best candidates for 
insertion of the ipbd gene. A functional fusion may require 
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additional residues between the IPBD and OSP domains to avoid 
unwanted interactions between the domains. Random- sequence 
DNA or DNA coding for a specific sequence of a protein 
homologous to the IPBD or OSP, can be inserted between the osp 
5 fragment and the ipbd fragment if needed. 

Fusion at a domain boundary within the OSP is also a good 
approach for obtaining a functional fusion. Smith exploited 
such a boundary when subcloning heterologous DNA into gene III 
of fl (SMIT85) . 

10 The criteria for identifying OSP domains suitable for 

causing display of an IPBD are somewhat different from those 
used to identify and IPBD. When identifying an OSP, minimal 
size is not so important because the OSP domain will not 
appear in the final binding molecule nor will we need to 

15 synthesize the gene repeatedly in each variegation round. The 
major design concerns are that: a) the OSP:: IPBD fusion 
causes display of IPBD, b) the initial genetic construction be 
reasonably convenient, and c) the osp : : ipbd gene be 
genetically stable and easily manipulated. There are several 

20 methods of identifying domains. Methods that rely on atomic 
coordinates have been reviewed by Janin and Chothia (JANI85) . 
These methods use matrices of distances between a carbons (C a ) , 
dividing planes (cf . ROSE85) , or buried surface (RASH84) . 
Chothia and col laborators have correlated the behavior of 

25 many natural proteins with domain structure (according to 

their definition) . Rashin correctly predicted the stability 
of a domain comprising residues 206-316 of thermolysin 
(VITA84, RASH84) . 

Many researchers have used partial proteolysis and 

3 0 protein sequence analysis to isolate and identify stable 
domains. (See, for example, VITA84, POTE83, SCOT87a, and 
PAB07 9.) Pabo et. al . used calorimetry as an indicator that 
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the cl repressor from the coliphage X contains two domains ; 
they then used partial proteolysis to determine the location 
of the domain boundary. 

If the only structural information available is the amino 
5 acid sequence of the candidate OSP, we can use the sequence to 
predict turns and loops. There is a high probability that 
some of the loops and turns will be correctly predicted ( cf . 
Chou and Fasman, (CHOU74) ) ; these locations are also 
candidates for insertion of the ipbd gene fragment. 
10 Bacterial OSPs 

In bacterial OSPs, the major considerations are: a) that 
the PBD is displayed, and b) that the chimeric protein not be 
toxic . 

From topological models of OSPs, we can determine whether 

15 the amino or carboxy termini of the OSP is exposed. If so, 
then these are excellent choices for fusion of the osp 
fragment to the ipbd fragment. 

The lamB gene has been sequenced and is available on a 
variety of plasmids (CLEM81, CHAR88) . Numerous fusions of 

2 0 fragments of lamB with a variety of other genes have been used 
to study export of proteins in coli . From various studies, 
Charbit et al . (CHAR88) have proposed a model that specifies 
which residues of LamB are: a) embedded in the membrane, b) 
facing the periplasm, and c) facing the cell surface; we adopt 

25 the numbering of this model for amino acids in the mature 

protein. According to this model, several loops on the outer 
surface are defined, including: 1) residues 88 through 111, 
2) residues 14 5 through 165, and 3) 23 6 through 251. 

Consider a mini -protein embedded in LamB. For example, 

30 insertion of DNA encoding GiNXCX 5 XXXCX 10 SGi 2 (SEQ ID NO: 8) 

between codons 153 and 154 of lamB is likely to lead to a wide 
variety of LamB derivatives being expressed on the surface of 
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E . coli cells. Gi, N 2 , Sn, and G i2 are supplied to allow the 
mini -protein sufficient orientational freedom that is can 
interact optimally with the target. Using affinity enrichment 
(involving, for example, FACS via a fluorescent ly labeled 
5 target, perhaps through several rounds of enrichment) , we 
might obtain a strain (named, for example, BEST) that 
expresses a particular LamB derivative that shows high 
affinity for the predetermined target. An octapeptide having 
the sequence of the inserted residues 3 through 10 from BEST 
10 is likely to have an affinity and specificity similar to that 
observed in BEST because the octapeptide has an internal 
structure that keeps the amino acids in a conformation that is 
quite similar in the LamB derivative and in the isolated mini- 
protein. 

15 Consideration of the Signal Peptide 

Fusing one or more new domains to a protein may make the 
ability of the new protein to be exported from the cell 
different from the ability of the parental protein. The 
signal peptide of the wild-type coat protein may function for 
20 authentic polypeptide but be unable to direct export of a 

fusion. To utilize the Sec-dependent pathway, one may need a 
different signal peptide. Thus, to express and display a 
chimeric BPTI/M13 gene VIII protein, we found it necessary to 
utilize a heterologous signal peptide (that of phoA ) . 

2 5 Provision of a means to remove PBD from the GP 

GPs that display peptides having high affinity for the 
target may be quite difficult to elute from the target, 
particularly a multivalent target. (Bacteria that are bound 
very tightly can simply multiply in situ . ) For phage, one can 

3 0 introduce a cleavage site for a specific protease, such as 

blood-clotting Factor Xa, into the fusion OSP protein so that 
the binding domain can be cleaved from the genetic package. 
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Such cleavage has the advantage that all resulting phage have 
identical OSPs and therefore are equally infective, even if 
polypeptide-displaying phage can be eluted from the affinity 
matrix without cleavage. This step allows recovery of 
5 valuable genes which might otherwise be lost . To our 

knowledge, no one has disclosed or suggested using a specific 
protease as a means to recover an information-containing 
genetic package or of converting a population of phage that 
vary in infect ivity into phage having identical infectivity. 

10 IV, G. Synthesis of Gene Inserts 

The present invention is not limited as to how a designed 
DNA sequence is divided for easy synthesis. An established 
method is to synthesize both strands of the entire gene in 
overlapping segments of 20 to 50 nucleotides (nts) (THER88) . 

15 An alternative method that is more suitable for synthesis of 

vgDNA is an adaptation of methods published by Oliphant et al . 
(OLIP86 and OLIP87) and Ausubel et al . (AUSU87) . It differs 
from previous methods in that it : a) uses two synthetic 
strands, and b) does not cut the extended DNA in the middle. 

20 Our goals are: a) to produce longer pieces of dsDNA than can 
be synthesized as ssDNA on commercial DNA synthesizers, and b) 
to produce strands complementary to single-stranded vgDNA. By 
using two synthetic strands, we remove the requirement for a 
palindromic sequence at the 3' end. 

25 DNA synthesizers can currently produce oligo-nts of 

lengths up to 200 nts in reasonable yield, M DNA = 200. The 
parameters N w (the length of overlap needed to obtain efficient 
annealing) and N s (the number of spacer bases needed so that a 
restriction enzyme can cut near the end of blunt -ended dsDNA) 

3 0 are determined by DNA and enzyme chemistry. N w = 10 and N s = 5 
are reasonable values. Larger values of N w and N s are allowed 
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but add to the length of ssDNA that is to be synthesized and 
reduce the net length of dsDNA that can be produced. 

Let A L be the actual length of dsDNA to be syn thesized, 
including any spacers. A L must be no greater than (2 M DNA - 
5 N w ) . Let Q w be the number of nts that the overlap window can 
deviate from center, 



10 Q w is never negative. It is preferred that the two fragments 
be approximately the same length so that the amounts 
synthesized will be approximately equal. This preference may 
be overridden by other considerations. The overall yield of 
dsDNA is usually dominated by the synthetic yield of the 

15 longer oligo-nt. 

We use the following procedure to generate dsDNA of 
lengths up to (2 M DNA - N w ) nts through the use of Klenow 
fragment to extend synthetic ss DNA fragments that are not 
more than M DNA nts long. When a pair of long oligo-nts, 

2 0 complementary for N w nts at their 3 1 ends, are annealed there 
will be a free 3 ' hydroxyl and a long ssDNA chain continuing 
in the 5 1 direction on either side. We will refer to this 
situation as a 5 • superoverhang . The procedure comprises: 



Qw 



- (2 M DNA - N w - AJ/2 



1) 



picking a non-pal indromic subsequence of N w to N w +4 nts 
near the center of the dsDNA to be syn thesized; this 
region is called the overlap (typically, N w is 10) , 
synthesizing a ss DNA molecule that comprises that part 
of the anti-sense strand from its 5 1 end up to and 



25 



2) 



30 



3) 



including the overlap, 

synthesizing a ss DNA molecule that comprises that part 
of the sense strand from its 5 1 end up to and including 
the overlap, 
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4) annealing the two synthetic strands that are 
complementary throughout the overlap region, and 

5) extending both superoverhangs with Klenow fragment and 
all four deoxynucleotide triphosphates. 

5 Because M DNA is not rigidly fixed at 200, the current limits of 
390 (= 2 M DNA - N w ) nts overall and 200 in each fragment are not 
rigid, but can be exceeded by 5 or 10 nts. Going beyond the 
limits of 390 and 200 will lead to lower yields, but these may 
be acceptable in certain cases. 

10 Restriction enzymes do not cut well at sites closer than 

about five base pairs from the end of blunt ds DNA fragments 
(OLIP87 and p. 132 New England BioLabs 1990-1991 Catalogue) . 
Therefore N s nts (with N s typically set to 5) of spacer are 
added to ends that we intend to cut with a restriction enzyme. 

15 If the plasmid is to be cut with a blunt -cutting enzyme, then 
we do not add any spacer to the corresponding end of the ds 
DNA fragment . 

To choose the optimum site of overlap for the oligo-nt 
fragments, first consider the anti-sense strand of the DNA to 

20 be synthesized, including any spacers at the ends, written (in 
upper case) from 5' to 3 1 and left- to-right . N. B . : The N w nt 
long overlap window can never include bases that are to be 
variegated. N.B.: The N w nt long overlap should not be 
palindromic lest single DNA molecules prime themselves. Place 

25 a N w nt long window as close to the center of the anti-sense 

sequence as possible. Check to see whether one or more codons 
within the window can be changed to increase the GC content 
without: a) destroying a needed restriction site, b) changing 
amino acid sequence, or c) making the overlap region 

3 0 palindromic. If possible, change some AT base pairs to GC 
pairs. If the GC content of the window is less than 50% , 
slide the window right or left as much as Q w nts to maximize 
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the number of C f s and G's inside the window, but without 
including any variegated bases. For each trial setting of the 
overlap window, maximize the GC content by silent codon 
changes, but do not destroy wanted restriction sites or make 
5 the overlap palindromic. If the best setting still has less 

than 50% GC, enlarge the window to N w +2 nts and place it within 
five nts of the center to obtain the maximum GC content. If 
enlarging the window one or two nts will increase the GC 
content, do so, but do not include variegated bases. 

10 Underscore the anti-sense strand from the 5 ! end up to 

the right edge of the window. Write the complementary sense 
sequence S'-to-S 1 and left -to-right and in lower case letters, 
under the anti-sense strand starting at the left edge of the 
window and continuing all the way to the right end of the 

15 anti-sense strand. 

We will synthesize the underscored anti-sense strand and 
the part of the sense strand that we wrote. These two 
fragments, complementary over the length of the window of high 
GC content, are mixed in equimolar quantities and annealed. 

20 These fragments are extended with Klenow fragment and all four 
deoxynucleotide triphosphates to produce ds blunt -ended DNA. 
This DNA can be cut with appropriate restriction enzymes to 
produce the cohesive ends needed to ligate the fragment to 
other DNA. 

25 The present invention is not limited to any parti cular 

method of DNA synthesis or construction. Conven tional DNA 
synthesizers may be used, with appropriate reagent 
modifications for production of variegated DNA (similar to 
that now used for production of mixed probes) . For example, 

30 the Milligen 7500 DNA synthesizer has seven vials from which 
phosphoramidites may be taken. Normally, the first four 
contain A, C, T, and G. The other three vials may contain 
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unusual bases such as inosine or mixtures of bases, the so- 
called "dirty bottle" . The standard software allows 
programmed mixing of two, three, or four bases in equimolar 
quantities . 

The synthesized DNA may be purified by any art recognized 
technique, e.g. , by high-pressure liquid chromatography (HPLC) 
or PAGE. 

The osp-pbd gene s may be created by inserting vgDNA into 
an existing parental gene, such as the osp- ipbd shown to be 
displayable by a suitably transformed GP . The present 
invention is not limited to any particular method of 
introducing the vgDNA, however, two techniques are discussed 
below. 

In the case of cassette mutagenesis, the restriction 
sites that were introduced when the gene for the inserted 
domain was synthesized are used to introduce the synthetic 
vgDNA into a plasmid or other OCV. Restriction digestions and 
ligations are performed by standard methods (AUSU87) . 

In the case of single-stranded-oligonucleotide- directed 
mutagenesis, synthetic vgDNA is used to create diversity in 
the vector (BOTS85) . 

The modes of creating diversity in the population of GPs 
discussed herein are not the only modes possible. Any method 
of mutagenesis that preserves at least a large fraction of the 
information obtained from one selection and then introduces 
other mutations in the same domain will work. The limiting 
factors are the number of independent transf ormants that can 
be produced and the amount of enrichment one can achieve 
through affinity separation. Therefore the preferred 
embodiment uses a method of mutagenesis that focuses mutations 
into those residues that are most likely to affect the binding 
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properties of the PBD and are least likely to destroy the 
underlying structure of the IPBD. 

Other modes of mutagenesis might allow other GPs to be 
considered. For example, the bacteriophage X is not a useful 
5 cloning vehicle for cassette mutagenesis because of the 

plethora of restriction sites. One can, however, use single- 
stranded-oligo-nt -directed mutagenesis on X without the need 
for unique restric tion sites. No one has used single- 
stranded-oligo-nt- directed mutagenesis to introduce the high 
10 level of diversity called for in the present invention, but if 
it is possible, such a method would allow use of phage with 
large genomes. 

IV. H . Operative Cloning Vector 

The operative cloning vector (OCV) is a replicable 
15 nucleic acid used to introduce the chimeric ipbd - osp or ipbd - 
osp gene into the genetic package. When the genetic package 
is a virus, it may serve as its own OCV. For cells and 
spores, the OCV may be a plasmid, a virus, a phagemid, or a 
chromosome . 

20 The OCV is preferably small (less than 10 KB) , stable 

(even after insertion of at least 1 kb DNA) , present in 
multiple copies within the host cell, and selectable with 
appropriate media. It is desirable that cassette mutagenesis 
be practical in the OCV; preferably, at least 25 restriction 

25 enzymes are available that do not cut the OCV. It is likewise 
desirable that single-stranded mutagenesis be practical. If a 
suitable OCV does not already exist, it may be engineered by 
manipulation of available vectors. 

When the GP is a bacterial cell or spore, the OCV is 

3 0 preferably a plasmid because genes on plasmids are much more 

easily constructed and mutated than are genes in the bacterial 
chromosome. When bacteriophage are to be used, the osp-ipbd 
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gene is inserted into the phage genome. The synthetic osp- 
ipbd genes can be constructed in small vectors and transferred 
to the GP genome when complete. 

Phage such as M13 do not confer antibiotic resistance on 
5 the host so that one can not select for cells infected with 

M13 . An antibiotic resistance gene can be engineered into the 
M13 genome (HINE80) . More virulent phage, such as <i>X174 # make 
discernable plaques that can be picked, in which case a 
resistance gene is not essential; furthermore, there is no 
10 room in the <£X174 virion to add any new genetic material. 
Inability to include an antibiotic resistance gene is a 
disadvantage because it limits the number of GPs that can be 
screened. 

It is preferred that GP(IPBD) carry a selectable marker 
15 not carried by wtGP . It is also preferred that wtGP carry a 
selectable marker not carried by GP(IPBD) . 

A derivative of M13 is the most preferred OCV when the 
phage also serves as the GP. Wild- type M13 does not confer 
any resistances on infected cells; M13 is a pure parasite. A 
2 0 "phagemid" is a hybrid between a phage and a plasmid, and is 
used in this invention. Double -stranded plasmid DNA isolated 
from phagemid- bearing cells is denoted by the standard 
convention, e.g. pXY24 . Phage prepared from these cells would 
be designated XY24 . Phagemids such as Bluescript K/ S (sold by 

2 5 Stratagene) are not preferred for our purposes because 

Bluescript does not contain the full genome of M13 and must be 
rescued by coinfection with competent wild-type M13 . Such 
coinfections could lead to genetic recombination yielding 
heterogeneous phage unsuitable for the purposes of the present 

3 0 invention. Phagemids may be entirely suitable for developing 

a gene that causes an IPBD to appear on the surface of phage - 
like genetic packages. 



It is also well known that plasmids containing the ColEl 
origin of replication can be greatly amplified if protein 
synthes.is is halted in a log-phase culture. Protein synthesis 
can be halted by addition of chloram phenicol or other agents 
5 (MANI82) . 

The bacteriophage M13 bla 61 (ATCC 3 703 9) is derived from 
wild-type M13 through the insertion of the S lactamase gene 
(HINE80) . This phage contains 8.13 kb of DNA. M13 bla cat 1 
(ATCC 3 704 0) is derived from M13 bla 61 through the additional 

10 insertion of the chloramphenicol resistance gene (HINE80) ; M13 
bla cat 1 contains 9.88 kb of DNA. Although neither of these 
variants of M13 contains the ColEl origin of replication, 
either could be used as a starting point to construct a 
cloning vector with this feature. 

15 IV. I . Transformation of cells: 

When the GP is a cell, the population of GPs is created 
by transforming the cells with suitable OCVs . When the GP is 
a phage, the phage are genetically engineered and then 
transfected into host cells suitable for amplification. When 

20 the GP is a spore, cells capable of sporulation are 

transformed with the OCV while in a normal metabolic state, 
and then sporulation is induced so as to cause the OSP-PBDs to 
be displayed. The present invention is not limited to any one 
method of transforming cells with DNA. The procedure given in 

25 the examples is a modification of that of Maniatis (p250, 
MANI82) . One preferably obtains at least 10 7 and more 
preferably at least 10 8 transf ormants/^g of CCC DNA. 

The transformed cells are grown first under non- 
selective conditions that allow expression of plasmid genes 

30 and then selected to kill untransf ormed cells. Transformed 
cells are then induced to express the osp- pbd gene at the 
appropriate level of induction. The GPs carrying the IPBD or 
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PBDs are then harvested by methods appropriate to the GP at 
hand, generally, centrif ugation to pelletize GPs and 
resuspension of the pellets in sterile medium (cells) or 
buffer (spores or phage) . They are then ready for 
5 verification that the display strategy was successful (where 
the GPs all display a "test 11 IPBD) or for affinity selection 
(where the GPs display a variety of different PBDs) . 

IV. J. Verification of Display Strategy: 

The harvested packages are tested to determine whether 

10 the IPBD is present on the surface. In any tests of GPs for 
the presence of IPBD on the GP surface, any ions or cof actors 
known to be essential for the stability of IPBD or Af M (IPBD) 
are included at appropriate levels. The tests can be done: 
a) by affinity labeling, b) enzymatically , c) 

15 spectrophotometrically, d) by affinity separation, or e) by 
affinity precipitation. The Af M (IPBD) in this step is one 
picked to have strong affinity (preferably, < 10" 11 M) for 
the IPBD molecule and little or no affinity for the wtGP. For 
example, if BPTI were the IPBD, trypsin, anhydrotrypsin, or 

20 antibodies to BPTI could be used as the Af M (BPTI) to test for 
the presence of BPTI. Anhydrotrypsin, a trypsin derivative 
with serine 195 converted to dehydroalanine , has no 
proteolytic activity but retains its affinity for BPTI (AKOH72 
and HUBE77) . 

25 Preferably, the presence of the IPBD on the surface of 

the GP is demonstrated through the use of a soluble, labeled 
derivative of a Af M (IPBD) with high affinity for IPBD. The 
label could be: a) a radioactive atom such as 125 1 , b) a 
chemical entity such as biotin, or 3) a fluorescent entity 

30 such as rhodamine or fluorescein. The labeled derivative of 
Af M ( IPBD) is denoted as Af M ( IPBD) * . The preferred procedure 
is : 
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1) mix AfM(IPBD)* with GPs that are to be tested for the 
presence of IPBD; conditions of mixing should favor 
binding of IPBD to Af M ( IPBD) * , 

2) separate GPs from unbound Af M (IPBD) * by use of: 

5 a) a molecular sizing filter that will pass Af M (IPBD) * 

but not GPs # 

b) centrif ugation, or 

c) a molecular sizing column (such as Sepharose or 
Sephadex) that retains free Af M (IPBD) * but not GPs, 

10 3) quantitate the Af M (IPBD) * bound by GPs. 

Alternatively, if the IPBD has a known biochemical activity 

(enzymatic or inhibitory) , its presence on the GP can be 

verified through this activity. For example, if the IPBD were 

BPTI, then one could use the stoichio metric inactivation of 
15 trypsin not only to demonstrate the presence of BPTI, but also 

to quantitate the amount. 

If the IPBD has strong, characteristic absorption bands 

in the visible or UV that are distinct from absorption by the 

wtGP, then another alternative for measuring the IPBD 
20 displayed on the GP is a spectrophotometric measurement. For 

example, if IPBD were azurin, the visible absorption could be 

used to identify GPs that display azurin. 

Another alternative is to label the GPs and measure the 

amount of label retained by immobilized Af M (IPBD) . For 
25 example, the GPs could be grown with a radioactive precursor, 

such as 32 P or 3 H- thymidine, and the radioactivity retained by 

immobilized Af M (IPBD) measured. 

Another alternative is to use affinity chromato-graphy ; 

the ability of a GP bearing the IPBD to bind a matrix that 
3 0 supports a Af M (IPBD) is measured by reference to the wtGP . 

Another alternative for detecting the presence of IPBD on 

the GP surface is affinity precipitation. 
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If random DNA has been used, then affinity selection 
procedures are used to obtain a clonal isolate that has the 
display-of -IPBD phenotype . Alternatively, clonal isolates may 
be screened for the display-of -IPBD phenotype. The tests of 
5 this step are applied to one or more of these clonal isolates. 
If no isolates that bind to the affinity molecule are 
obtained we take corrective action as disclosed below. 

If one or more of the tests above indicates that the IPBD 
is displayed on the GP surface, we verify that the binding of 
10 molecules having known affinity for IPBD is due to the 

chimeric osp-ipbd gene through the use of standard genetic and 
biochemical techniques, such as: 

1) transferring the osp- ipbd gene into the parent GP to 
verify that osp-ipbd confers binding, 
15 2) deleting the osp-ipbd gene from the isolated GP to verify 

that loss of osp-ipbd causes loss of binding, 
3) showing that binding of GPs to Af M (IPBD) correlates with 
[XINDUCE] (in those cases that expression of osp-ipbd is 
controlled by [XINDUCE] ) , and 
20 4) showing that binding of GPs to Af M (IPBD) is specific to 

the immobilized Af M ( IPBD) and not to the support matrix. 
Variation of: a) binding of GPs by soluble Af M ( IPBD) * , b) 
absorption caused by IPBD, and c) biochemical reactions of 
IPBD are linear in the amount of IPBD displayed. Presence of 
25 IPBD on the GP surface is indicated by a strong correlation 
between [XINDUCE] and the reactions that are linear in the 
amount of IPBD. Leakiness of the promoter is not likely to 
present problems of high background with assays that are 
linear in the amount of IPBD. These experiments may be 
30 quicker and easier than the genetic tests. Interpreting the 
effect of [XINDUCE] on binding to a { Af M ( IPBD) } column, 
however, may be problematic unless the regulated promoter is 
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completely repressed in the absence of [XINDUCE] . The 
affinity retention of GP(IPBD)s is not linear in the number of 
IPBDs/GP and there may be, for example, little phenotypic 
difference between GPs bearing 5 IPBDs and GPs bearing 50 
5 IPBDs. The demonstration that binding is to AfM(IPBD) and the 
genetic tests are essential; the tests with XINDUCE are 
optional . 

We sequence the relevant ipbd gene fragment from each of 
several clonal isolates to determine the construction. We 

10 also establish the maximum salt concentration and pH range for 
which the GP(IPBD) binds the chosen Af M (IPBD) . This is 
preferably done by measuring, as a function of salt 
concentration and pH, the retention of Af M ( IPBD) * on molecular 
sizing filters that pass Af M (IPBD) * but not GP. This 

15 information will be used in refining the affinity selection 
scheme . 

IV. K. Analysis and Correction of Display Problems 

If the IPBD is displayed on the outside of the GP, and if 
that display is clearly caused by the introduced osp-ipbd 

20 gene, we proceed with variegation, otherwise we analyze the 

result and adopt appropriate corrective measures. If we have 
unsuccessfully attempted to fuse an ipbd fragment to a natural 
osp fragment, our options are :1) pick a different fusion to 
the same osp by a) using opposite end of osp , b) keeping more 

2 5 or fewer residues from osp in the fusion; for example, in 

increments of 3 or 4 residues, c) trying a known or predicted 
domain boundary, d) trying a predicted loop or turn position, 
2) pick a different osp , or 3) switch to random DNA method. 
If we have just tried the random DNA method unsuccessfully, 

30 our options are: 1) choose a different relationship between 

ipbd fragment and random DNA ( ipbd first, random DNA second or 
vice versa ) , 2) try a different degree of partial digestion, a 
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different enzyme for partial digestion, a different degree of 
shearing or a different source of natural DNA, or 3) switch to 
the natural OSP method. If all reasonable OSPs of the current 
GP have been tried and the random DNA method has been tried, 
5 both without success, we pick a new GP . 

We may illustrate the ways in which problems may be 
attacked by using the example of BPTI as the IPBD, the M13 
phage as the GP, and the major coat (gene VIII) protein as the 
OSP. The following amino-acid sequence, called AA_seq2 (SEQ 
10 ID NO:128), illustrates how the sequence for mature BPTI ((SEQ 
ID NO: 44), shown underscored) may be inserted immediately 
after the signal sequence of M13 precoat protein (indicated by 
the arrow) and before the sequence for the M13 CP. 

AA_seq2 (SEQ ID NO: 12 8) 

15 

1 12ii2 3 3 4 4 5 

5 0 5 0 4J-5 0 5 0 5 0 

MKKSLVLKASVAVATLVPMLSF ARPDFCLEPPYTGPCKARIIRYFYNAKA 

20 

566778899 10 
5050505050 
GLCQTFVYGGCRAKRNNFKSAEDCMRTCGGAAEGDDPAKAAFNSLQASAT 



25 10 11 11 12 12 13 

5 0 5 0 5 0 

E YI GYAWAMVWI VGAT I G I KLFKKFTS KAS 

We adopt the convention that sequence numbers of fusion 
3 0 proteins refer to the fusion, as coded, unless otherwise 

noted. Thus the alanine that begins M13 CP is referred to as 
"number 82", "number 1 of M13 CP", or "number 59 of the mature 
BPTI-M13 CP fusion". 

It is desirable to determine where, exactly, the BPTI 
35 binding domain is being transported: is it remaining in the 

cytoplasm? Is it free within the periplasm? Is it attached to 
the inner membrane? Proteins in the periplasm can be freed 
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through spheroplast formation using lysozyme and EDTA in a 
concentrated sucrose solution (BIRD67, MALA64) . If BPTI were 
free in the periplasm, it would be found in the supernatant. 
Trypsin labeled with 125 I would be mixed with supernatant and 
5 passed over a non-denaturing molecular sizing column and the 
radioactive fractions collected. The radioactive fractions 
would then be analyzed by SDS-PAGE and examined for BPTI -si zed 
bands by silver staining. 

Spheroplast formation exposes proteins anchored in the 

10 inner membrane. Spheroplasts would be mixed with AHTrp* and 
then either filtered or centrifuged to separate them from 
unbound AHTrp*. After washing with hypertonic buffer, the 
spheroplasts would be analyzed for extent of AHTrp* binding. 
If BPTI were found free in the periplasm, then we would 

15 expect that the chimeric protein was being cleaved both 

between BPTI and the M13 mature coat sequence and between BPTI 
and the signal sequence. In that case, we should alter the 
BPTI/M13 CP junction by inserting vgDNA at codons for residues 
78-82 of AA_seq2 . 

2 0 If BPTI were found attached to the inner membrane, then 

two hypotheses can be formed. The first is that the chimeric 
protein is being cut after the signal sequence, but is not 
being incorporated into LG7 virion; the treatment would also 
be to insert vgDNA between residues 78 and 82 of AA_seq2 . The 

25 alternative hypothesis is that BPTI could fold and react with 
trypsin even if signal sequence is not cleaved. N- terminal 
amino acid sequencing of trypsin-binding material isolated 
from cell homogenate determines what processing is occurring. 
If signal sequence were being cleaved, we would use the 

30 procedure above to vary residues between C78 and A82; 

subsequent passes would add residues after residue 81. If 
signal sequence were not being cleaved, we would vary residues 
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between 23 and 27 of AA_seq2 . Subsequent passes through that 
process would add residues after 23. 

If BPTI were found neither in the periplasm nor on the 
inner membrane, then we would expect that the fault was in the 
5 signal sequence or the signal-sequence- to- BPTI junction. The 
treatment in this case would be to vary residues between 23 
and 27. 

Analytical experiments to determine what has gone wrong 
take time and effort and # for the foreseen out comes, indicate 
10 variations in only two regions. There fore, we believe it 
prudent to try the synthetic experiments described below 
without doing the analysis. For example, these six 
experiments that introduce variegation into the bpti-gene VIII 
fusion could be tried: 
15 1) 3 variegated codons between residues 78 and 82 using 

olig#12 and olig#13, 

2) 3 variegated codons between residues 2 3 and 2 7 using 
olig#14 and olig#15, 

3) 5 variegated codons between residues 7 8 and 82 using 
20 olig#13 and olig#12a, 

4) 5 variegated codons between residues 2 3 and 2 7 using 
olig#15 and olig#14a, 

5) 7 variegated codons between residues 7 8 and 82 using 
olig#13 and olig#12b, and 

2 5 6) 7 variegated codons between residues 2 3 and 2 7 using 

olig#15 and olig#14b. 

To alter the BPTI-M13 CP junction, we introduce DNA 
variegated at codons for residues between 78 and 82 into the 
SphI and Sf i l sites of pLG7 . The residues after the last 

3 0 cysteine are highly variable in amino acid sequences 

homologous to BPTI, both in composition and length; in Table 
25 these residues are denoted as G79, G80, and A81. The first 
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part of the M13 CP is denoted as A82, E83, and G84 . One of 
the oligo-nts olig#12 / olig#12a / or olig#12b and the primer 
olig#13 are synthesized by standard methods. The oligo-nts 
are : 

5 

residue 75 76 77 78 79 80 81 82 83 
5 1 gc | gag | cGC | ATG | CGT | ACC | TGC | qf k | qf k | qf k | GCT | GAA | - 

84 85 86 87 88 89 90 91 
10 GGT|GAT|GAT|CCG|GCC|AAA|GCG|GCC|gcg|cc 3' olig#12 

(SEQ ID NO: 129) 

residue 75 76 77 78 79 80 81 81a 81b 
5 1 gc | gag | cGC | ATG | CGT | ACC | TGC | qf k | qf k | qf k | qf k | qf k | - 

15 

82 83 84 85 86 87 
GCT | GAA | GGT | GAT | GAT | CCG | - 

88 89 90 91 

20 GCC | AAA | GCG | GCC | gcg | cc 3' olig#12a 

(SEQ ID NO: 13 0) 

residue 75 76 77 78 79 80 81 81a 81b 
5 ■ gc | gag | cGC | ATG | CGT | ACC | TGC | qf k | qf k | qf k | qf k | qf k | - 

25 

81c 81d 82 83 84 85 86 87 

qf k | qf k | GCT | GAA | GGT | GAT | GAT | CCG | - 

88 89 90 91 

30 GCC | AAA | GCG | GCC | gcg | cc 3» olig#12b 

(SEQ ID NO: 131) 

residue 91 90 89 88 87 86 

5 1 gg | cgc | GGC | CGC | TTT | GGC | CGG | ATC 3» olig#13 
35 (SEQ ID NO:132) 

where q is a mixture of (0.26 T, 0.18C, 0.26 A, and 0.30 G) , f 
is a mixture of (0.22 T, 0.16 C, 0.40 A, and 0.22 G) , and k is 
a mixture of equal parts of T and G. The bases shown in lower 
4 0 case at either end are spacers and are not incorporated into 

the cloned gene. The primer is complementary to the 3 1 end of 
each of the longer oligo-nts. One of the variegated oligo-nts 
and the primer olig#13 are combined in equimolar amounts and 
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annealed. The dsDNA is completed with all four (nt)TPs and 
Klenow fragment- The resulting dsDNA and RF pLG7 are cut with 
both Sfil and Sph I , purified, mixed, and ligated. We then 
select a transformed clone that, when induced with IPTG, binds 
5 AHTrp . 

To vary the junction between Ml 3 signal sequence and 
BPTI, we introduce DNA variegated at codons for residues 
between 23 and 27 into the Kpn l and Xhol sites of pLG7 . The 
first three residues are highly variable in amino acid 
10 sequences homologous to BPTI. Homologous sequences also vary 
in length at the amino terminus. One of the oligo-nts 
olig#14, olig#14a, or olig#14b and the primer olig#15 are 
synthesized by standard methods. The oligo-nts are: 

15 residue : 17 18 19 20 21 22 23 24 25 

5 1 g | gcc | gcG | GTA | CCG | ATG | CTG | TCT | TTT | GCT | qf k | qf k | - 

26 27 28 29 30 
| qf k | TTC | TGT | CTC | GAG | cgc | ccg | cga | 3 ' ol ig#14 
20 (SEQ ID NO:133) 

residue 17 18 19 20 21 22 23 24 25 26 

5 ' gcc | gcG | GTA | CCG | ATG [ CTG | TCT | TTT | GCT | qf k | qf k | qf k | - 

25 26a 26b 27 28 29 30 

| qf k | qf k | TTC | TGT | CTC | GAG | cgc | ccg | cga | 3 • ol ig#14a 

(SEQ ID NO: 134) 

residue 17 18 19 20 21 22 23 24 25 26 

3 0 5 1 g | gcc | gcG | GTA | CCG | ATG | CTG | TCT | TTT | GCT | qf k | qf k | qf k | - 

26a 26b 26c 26d 27 28 29 30 

| qf k | qf k | qf k | qf k | TTC | TGT | CTC | GAG | cgc | ccg | cga | 3 1 olig#14b 

(SEQ ID NO:135) 

35 

5 ' | teg | egg | gcg | CTC | GAG | ACA | GAA | 3 ' olig#15 

(SEQ ID NO: 136) 

where q is a mixture of (0.26 T, 0.18 C, 0.26 A, and 0.30 G) , 
40 f is a mixture of (0.22 T, 0.16 C, 0.40 A, and 0.22 G) , and k 
is a mixture of equal parts of T and G. The bases shown in 
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lower case at either end are spacers and are not incorporated 
into the cloned gene. One of the variegated oligo-nts and the 
primer are combined in equimolar amounts and annealed. The ds 
DNA is completed with all four (nt)TPs and Klenow fragment. 
5 The resulting dsDNA and RF pLG7 are cut with both Kpn l and 
Xhol, purified, mixed, and ligated. We select a transformed 
clone that, when induced with IPTG, binds AHTrp or trp. 
Other numbers of variegated codons could be used. 
If none of these approaches produces a working chimeric 
10 protein, we may try a different signal sequence. If that 
doesn't work, we may try a different OSP. 
V. AFFINITY SELECTION OF TARGET -BINDING MUTANTS 
V.A. Affinity Separation Technology, Generally 

Affinity separation is used initially in the present 
15 invention to verify that the display system is working, i.e. , 
that a chimeric outer surface protein has been expressed and 
transported to the surface of the genetic package and is 
oriented so that the inserted binding domain is accessible to 
target material. When used for this purpose, the binding 

2 0 domain is a known binding domain for a particular target and 

that target is the affinity molecule used in the affinity 
separation process. For example, a display system may be 
validated by using inserting DNA encoding BPTI into a gene 
encoding an outer surface protein of the genetic package of 
25 interest, and testing for binding to anhydrotrypsin, which is 
normally bound by BPTI. 

If the genetic packages bind to the target, then we have 
confirmation that the corresponding binding domain is indeed 
displayed by the genetic package. Packages which display the 

3 0 binding domain (and thereby bind the target) are separated 

from those which do not. 
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Once the display system is validated, it is possible to 
use a variegated population of genetic packages which display 
a variety of different potential binding domains, and use 
affinity separation technology to determine how well they bind 
5 to one or more targets. This target need not be one bound by 
a known binding domain which is parental to the displayed 
binding domains, i.e. , one may select for binding to a new 
target . 

For example, one may variegate a BPTI binding domain and 
10 test for binding, not to trypsin, but to another serine 

protease, such as human neutrophil elastase or cathepsin G, or 
even to a wholly unrelated target, such as horse heart 
myoglobin . 

The term "affinity separation means" includes, but is not 
15 limited to: a) affinity column chromatography, b) batch 

elution from an affinity matrix material, c) batch elution 
from an affinity material attached to a plate, d) fluorescence 
activated cell sorting, and e) electrophoresis in the presence 
of target material. "Affinity material" is used to mean a 
20 material with affinity for the material to be purified, called 
the "analyte". In most cases, the association of the affinity 
material and the analyte is reversible so that the analyte can 
be freed from the affinity material once the impurities are 
washed away. 

25 The procedures described in sections V.H, V.I and V.J are 

not required for practicing the present invention, but may 
facilitate the development of novel binding proteins thereby. 
V.B. Affinity Chromatography, Generally 

Affinity column chromatography, batch elution from an 

30 affinity matrix material held in some container, and batch 

elution from a plate are very similar and hereinafter will be 
treated under "affinity chromatography. " 
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If affinity chromatography is to be used, then: 
1) the molecules of the target material must be of 

sufficient size and chemical reactivity to be applied to 
a solid support suitable for affinity separation, 
5 2) after application to a matrix, the target material 

preferably does not react with water, 
3) after application to a matrix, the target material 

preferably does not bind or degrade proteins in a non- 
specific way, and 
10 4) the molecules of the target material must be sufficiently 

large that attaching the material to a matrix allows 
enough unaltered surface area (generally at least 500 A 2 , 
excluding the atom that is connected to the linker) for 
protein binding. 
15 Affinity chromatography is the preferred separation 

means, but FACS, electrophoresis, or other means may also be 
used. 

V.C. Fluorescent-Activated Cell Sorting, Generally 

Fluorescent -activated cell sorting involves use of an 
20 affinity material that is fluorescent per se or is labeled 

with a fluorescent molecule. Current commercially available 
cell sorters require 800 to 1000 molecules of fluorescent dye, 
such as Texas red, bound to each cell. FACS can sort 10 3 cells 
or viruses/sec. 

25 FACS ( e.g. FACStar from Beckton-Dickinson, Mountain View, 

CA) is most appropriate for bacterial cells and spores because 
the sensitivity of the machines requires approximately 1000 
molecules of fluorescent label bound to each GP to accomplish 
a separation. OSPs such as OmpA, OmpF, OmpC are present at 

30 £>10 4 /cell, often as much as 10 5 /cell. Thus use of FACS with 
PBDs displayed on one of the OSPs of a bacterial cell is 
attractive. This is particularly true if the target is quite 
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small so that attachment to a matrix has a much greater effect 
than would attachment to a dye. To optimize FACS separation 
of GPs, we use a derivative of Afm(IPBD) that is labeled with 
a fluorescent molecule, denoted Afm(IPBD)*. The variables to 
5 be optimized include: a) amount of IPBD/GP, b) concentration 
of Afm(IPBD)*, c) ionic strength, d) concentration of GPs, and 
e) parameters pertaining to operation of the FACS machine. 
Because Afm(IPBD)* and GPs interact in solution, the binding 
will be linear in both [Af m ( IPBD) * ] and [displayed IPBD]. 
10 Preferably, these two parameters are varied together. The 
other parameters can be optimized independently. 

If FACS is to be used as the affinity separation means, 

then: 

1) the molecules of the target material must be of 

15 sufficient size and chemical reactivity to be conjugated 

to a suitable fluorescent dye or the target must itself 
be fluorescent , 

2) after any necessary fluorescent labeling, the target 
preferably does not react with water, 

20 3) after any necessary fluorescent labeling, the target 

material preferably does not bind or degrade proteins in 
a non-specific way, and 
4) the molecules of the target material must be sufficiently 
large that attaching the material to a suitable dye 
25 allows enough unaltered surface area (generally at least 

500 A 2 , excluding the atom that is connected to the 
linker) for protein binding. 
V.D. Affinity Electrophoresis, Generally 

Electrophoretic affinity separation involves 
30 electrophoresis of viruses or cells in the presence of target 
material, wherein the binding of said target material changes 
the net charge of the virus particles or cells. It has been 
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used to separate bacteriophages on the basis of charge. 
(SERW87) . 

Electrophoresis is most appropriate to bacteriophage 
because of their small size (SERW87) . Electrophoresis is a 
5 preferred separation means if the target is so small that 

chemically attaching it to a column or to a fluorescent label 
would essentially change the entire target. For example, 
chloroacetate ions contain only seven atoms and would be 
essentially altered by any linkage. GPs that bind 
10 chloroacetate would become more negatively charged than GPs 

that do not bind the ion and so these classes of GPs could be 
separated. 

If affinity electrophoresis is to be used, then: 

1) the target must either be charged or of such a nature 
15 that its binding to a protein will change the charge of 

the protein, 

2) the target material preferably does not react with water, 

3) the target material preferably does not bind or degrade 
proteins in a non-specific way, and 

2 0 4) the target must be compatible with a suitable gel 

material . 

The present invention makes use of affinity separation of 
bacterial cells, or bacterial viruses (or other genetic 
25 packages) to enrich a population for those cells or viruses 
carrying genes that code for proteins with desirable binding 
properties . 
V.E. Target Materials 

The present invention may be used to select for binding 

3 0 domains which bind to one or more target mater ials, and/or 

fail to bind to one or more target materials. Specificity, of 
course, is the ability of a binding molecule to bind strongly 
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to a limited set of target materials, while binding more 
weakly or not at all to another set of target materials from 
which the first set must be distinguished. 

The target materials may be organic macromolecules, such 
5 as polypeptides, lipids, polynucleic acids, and 

polysaccharides, but are not so limited. Almost any molecule 
that is stable in aqueous solvent may be used as a target. 
The following list of possible targets is given as 
illustration and not as limitation. The categories are not 
10 strictly mutually exclusive. The omission of any category is 
not to be construed to imply that said category is unsuitable 
as a target. Merck Index refers to the Eleventh Edition. 

A. Peptides 

1) human £ endorphin (Merck Index 3528) 
15 2) dynorphin (MI 34 58) 

3) Substance P (MI 8834) 

4) Porcine somatostatin (MI 8671) 

5) human atrial natriuretic factor (MI 887) 

6) human calcitonin 

2 0 7) glucagon 

B. Proteins 

I . Soluble Proteins 
a. Hormones 

1) human TNF (MI 9411) 
25 2) Interleukin-1 (MI 4895) 

3) Interferon-y (MI 4894) 

4) Thyrotropin (MI 970 9) 

5) Interf eron-of (MI 4892) 

6) Insulin (MI 4887, p. 789) 

3 0 b. Enzymes 

1) human neutrophil elastase 

2) Human thrombin 
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3) human Cathepsin G 

4) human tryptase 

5) human chymase 

6) human blood clotting Factor Xa 

7) any retro-viral Pol protease 

8) any retro-viral Gag protease 

9) dihydrof olate reductase 

10) Pseudomonas put i da cytochrome P4 50cam 

11) human pyruvate kinase 

12) coli pyruvate kinase 

13) jack bean urease 

14) aspartate transcarbamylase (E^ coli ) 
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15) ras protein 

16) any protein- tyrosine kinase 

c. Inhibitors 

1) aprotinin (MI 784) 
5 2) human orl -anti -trypsin 

3) phage X cl (inhibits DNA transcription) 

d. Receptors 

1) TNF receptor 

2) IgE receptor 
10 3) LamB 

4) CD4 

5) IL-1 receptor 

e . Toxins 

1) ricin (also an enzyme) 
15 2) a Conotoxin GI 

3) mellitin 

4) Bordetella pertussis adenylate cyclase (also an 
enzyme) 

5) Pseudomonas aeruginosa hemolysin 
20 f . Other proteins 

1) horse heart myoglobin 

2) human sickle-cell haemoglobin 

3) human deoxy haemoglobin 

4) human CO haemoglobin 

25 5) human low-density lipoprotein (a lipoprotein) 

6) human IgG (combining site removed or blocked) (a 
glycoprotein) 

7) influenza haemagglutinin 

8) phage X capsid 
30 9) fibrinogen 

10) HIV-1 gpl20 

11) Neisseria gonorrhoeae pilin 
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II 



12) fibril or flagellar protein from spirochaete 
bacterial species such as those that cause 
syphilis, Lyme disease, or relapsing fever 

13) pro- enzymes such as prothrombin and tripsinogen 
Insoluble Proteins 

1) silk 

2) human elastin 

3) keratin 



10 



15 



20 



4) collagen 

5) fibrin 
Nucleic acids 

a. DNA 

1) ds DNA : 

2) ds DNA : 

3) SS DNA : 

4) SS DNA : 



5 ■ - ACTAGTCTC - 3 » 
3 ' - TGATCAGAG - 5 ' 

5 1 -CCGTCGAATCCGC-3 ' 
3 1 - GGCAGTTTAGGCG - 5 » 
(Note mismatch) 



(SEQ 
(SEQ 



ID 
ID 



NO: 90) 
NO: 91) 



25 



5 ' - CGTAACCTCGTCATTA- 3 1 

(No hair pin) (SEQ ID NO: 92) 

5 1 -CCGTAGGT n 
3 1 -GGCATCCA J 

(Note hair pin) (SEQ ID NO: 93) 



30 



35 



5) dsDNA with cohesive ends : 

5 ' -CACGGCTATTACGGT-3 1 (SEQ ID NO: 94) 
3'- CCGATAATGCCA- 5 1 (SEQ ID NO: 95) 

b. RNA 

1) yeast Phe tRNA 

2) ribosomal RNA 

3) segment of mRNA 

D. Organic molecules (not peptide, protein, or nucleic acid) 
I. Small and monomeric 

1) cholesterol 

2) aspartame 
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3) bilirubin 

4) morphine 

5) codeine 

6) heroine 

5 7) dichlorodiphenyltrichlorethane (DDT) 

8) prostaglandin PGE2 

9) actinomycin 

10) 2,2,3 trimethyldecane 

11 ) Buckminsterf ullerene 

10 12) cortavazol (MI 2536, p. 397) 

II. Polymers 

1) cellulose 

2) chitin 

III. Others 

15 1) O-antigen of Salmonella enteritidis (a 

lipopolysaccharide) 
E. Inorganic compounds 

1) asbestos 

2) zeolites 

2 0 3) hydroxy 1 apatite 

4) 111 face of crystalline silicon 

5) paulingite 

6) U(IV) (uranium ions) 

7) Au(III) (gold ions) 
25 F. Organometallic compounds 

1) iron (III) haem 

2) cobalt haem 

3) cobalamine 

4) (isopropylamino) 6 Cr (III) 

30 Serine proteases are an especially interesting class of 

potential target materials. Serine proteases are ubiquitous 
in living organisms and play vital roles in processes such as: 
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digestion, blood clotting, fibrinolysis, immune response, 
fertilization, and post- translational processing of peptide 
hormones. Although the role these enzymes play is vital, 
uncontrolled or inappropriate proteolytic activity can be very 
5 damaging. Several serine proteases are directly involved in 
serious disease states. Uncontrolled neutrophil elastase (NE) 
(also known as leukocyte elastase) is thought to be the major 
cause of emphysema (BEIT86, HUBB86, HUBB89, HUTC87, SOMM90, 
WEWE87) whether caused by congenital lack of a-1- antitrypsin 
10 or by smoking. NE is also implicated as an essential 
ingredient in the pernicious cycle of: 




(excess secretion of proteases by neutrophils)- 



( inf larrmaticn) 
(recruitment of neutrophils) 



15 

observed in cystic fibrosis (CF) (NADE90) . Inappropriate NE 
activity is very harmful and to stop the progression of 
emphysema or to alleviate the symptoms of CF, an inhibitor of 
very high affinity is needed. The inhibitor must be very 

20 specific to NE lest it inhibit other vital serine proteases or 
esterases. Nadel (NADE90) has suggested that onset of excess 
secretion is initiated by 10' 10 M NE; thus, the inhibitor must 
reduce the concentration of free NE to well below this level. 
Thus human neutrophil elastase is a preferred target and a 

25 highly stable protein is a preferred IPBD. In particular, 
BPTI, ITI-D1, or another BPTI homologue is a preferred IPBD 
for development of an inhibitor to HNE . Other preferred IPBDs 
for making an inhibitor to HNE include CMTI-III, SLPI, Eglin, 
or-conotoxin GI , and Q Conotoxins . 
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HNE is not the only serine protease for which an 
inhibitor would be valuable. Works concerning uses of 
protease inhibitors and diseases thought to result from 
inappropriate protease activity include: NADE87, REST88, 
5 SOMM90, and SOMM89. Tryptase and chymase may be involved in 
asthma, see FRAN89 and VAND89. There are reports that suggest 
that Proteinase 3 (also known as p2 9) is as important or even 
more important than HNE; see NILE89, ARNA90, KAOR88, CAMP90, 
and GUPT90 . Cathepsin G is another protease that may cause 

10 disease when present in excess; see FERR9 0 , PETE 8 9 , SALV87, 
and SOMM90. These works indicate that a problem exists and 
that blocking one or another protease might well alleviate a 
disease state. Some of the cited works report inhibitors 
having measurable affinity for a target protease, but none 

15 report truly excellent inhibitors that have K<a in the range of 
10" 12 M as may be obtained by the method of the present 
invention. The same IPBDs used for HNE can be used for any 
serine protease . 

The present invention is not, however, limited to any of 

20 the above- identified target materials. The only limitation is 
that the target material be suitable for affinity separation. 

A supply of several milligrams of pure target material is 
desired. With HNE (as discussed in Examples II and III) , 400 
/ig of enzyme is used to prepare 200 /xl of ReactiGel beads. 

25 This amount of beads is sufficient for as many as 40 

fractionations. Impure target material could be used, but one 
might obtain a protein that binds to a contaminant instead of 
to the target . 

The following information about the target material is 

30 highly desirable: 1) stability as a function of temperature, 
pH, and ionic strength, 2) stability with respect to 
chaotropes such as urea or guanidinium Cl , 3) pi, 4) molecular 



170 

weight, 5) require merits for prosthetic groups or ions, such 
as haem or Ca +2 , and 6) proteolytic activity, if any. It is 
also potentially useful to know: 1) the target's sequence, if 
the target is a macromolecule , 2) the 3D structure of the 
5 target, 3) enzymatic activity, if any, and 4) toxicity, if 
any. 

The user of the present invention specifies certain 
parameters of the intended use of the binding protein: 1) the 
acceptable temperature range, 2) the acceptable pH range, 3) 
10 the acceptable concentrations of ions and neutral solutes, and 
4) the maximum acceptable dissociation constant for the target 
and the SBD : 

K T = [Target] [SBD] / [Target : SBD] . 
In some cases, the user may require discrimination between T, 
15 the target, and N, some non- target. Let 
K T = [T] [SBD] / [T: SBD] , and 
Kn = [N] [SBD] / [N: SBD] , 

then K T /K N = ( [T] [N : SBD] ) / ( [N] [T : SBD] ) . 

The user then specifies a maximum acceptable value for the 
20 ratio K T /K N . 

The target material preferably is stable under the 
specified conditions of pH, temperature, and solution 
conditions . 

If the target material is a protease, one considers the 
25 following points: 

1) a highly specific protease can be treated like any other 
target , 

2) a general protease, such as subtilisin, may degrade the 
OSPs of the GP including OSP-PBDs; there are several 

3 0 alternative ways of dealing with general proteases, 

including: a) use a protease inhibitor as PPBD so that 
the SBD is an inhibitor of the protease, b) a chemical 
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inhibitor may be used to prevent proteolysis ( e.g. 
phenylmethylf luorosulf ate (PMFS) that inhibits serine 
proteases) , c) one or more active-site residues may be 
mutated to create an inactive protein ( e.g. a serine 
5 protease in which the active serine is mutated to 

alanine) , or d) one or more active-site amino-acids of 
the protein may be chemically modified to destroy the 
catalytic activity ( e.g. a serine protease in which the 
active serine is converted to anhydroserine) , 
10 3) SBDs selected for binding to a protease need not be 

inhibitors; SBDs that happen to inhibit the protease 
target are a fairly small subset of SBDs that bind to the 
protease target, 

4) the more we modify the target protease, the less like we 
15 are to obtain an SBD that inhibits the target protease, 

and 

5) if the user requires that the SBD inhibit the target 
protease, then the active site of the target protease 
must not be modified any more than necessary; 

20 inactivation by mutation or chemical modification are 

preferred methods of inactivation and a protein protease 
inhibitor becomes a prime candidate for IPBD. For 
example, BPTI has been mutated, by the methods of the 
present invention, to bind to proteases other than 
25 trypsin. 

Example III - VI disclose that uninhibited serine 
proteases may be used as targets quite successfully and that 
protein protease inhibitors derived from BPTI and selected for 
binding to these immobilized proteases are excellent 
30 inhibitors . 
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V.F. Immobilization or Labeling of Target Material 

For chromatography, FACS, or electrophoresis there may be 
a need to covalently link the target material to a second 
chemical entity. For chromatography the second entity is a 
5 matrix, for FACS the second entity is a fluorescent dye, and 
for electrophoresis the second entity is a strongly charged 
molecule. In many cases, no coupling is required because the 
target material already has the desired property of: a) 
immobility, b) fluorescence, or c) charge. In other cases, 

10 chemical or physical coupling is required. 

Various means may be used to immobilize or label the 
target materials. The means of immobilization or labeling is, 
in part, determined by the nature of the target. In 
particular, the physical and chemical nature of the target and 

15 its functional groups of the target material determine which 
types of immobilization reagents may be most easily used. 

For the purpose of selecting an immobilization method, it 
may be more helpful to classify target materials as follows: 
(a) solid, whether crystalline or amorphous, and insoluble in 

20 an aqueous solvent ( e.g. , many minerals, and fibrous organics 
such as cellulose and silk) ; (b) solid, whether crystalline or 
amorphous, and soluble in an aqueous solvent; (c) liquid, but 
insoluble in aqueous phase ( e.g. , 2,3,3- trimethyldecane) ; or 
(d) liquid, and soluble in aqueous media. 

25 It is not necessary that the actual target material be 

used in preparing the immobilized or labeled analogue that is 
to be used in affinity separation; rather, suitable reactive 
analogues of the target material may be more convenient. If 
2,3,3- trimethyldecane were the target material, for example, 

30 then 2 , 3 , 3 - t rimethyl - 1 0 - aminodecane would be far easier to 
immobilize than the parental compound. Because the latter 
compound is modified at one end of the chain, it retains 
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almost all of the shape and charge attributes that 
differentiate the former compound from other alkanes . 

Target materials that do not have reactive functional 
groups may be immobilized by first creating a reactive 
5 functional group through the use of some powerful reagent, 

such as a halogen. For example, an alkane can be immobilized 
for affinity by first halogenating it and then reacting the 
halogenated derivative with an immobilized or immobilizable 
amine . 

10 In some cases, the reactive groups of the actual target 

material may occupy a part on the target molecule that is to 
be left undisturbed. In that case, additional functional 
groups may be introduced by synthetic chemistry. For example, 
the most reactive groups in cholesterol are on the steroid 

15 ring system, viz , -OH and >C=C. We may wish to leave this 
ring system as it is so that it binds to the novel binding 
protein. In this case, we prepare an analogue having a 
reactive group attached to the aliphatic chain (such as 26- 
aminocholesterol) and immobilize this derivative in a manner 

2 0 appropriate to the reactive group so attached. 

Two very general methods of immobilization are widely 
used. The first is to biotinylate the compound of interest 
and then bind the biotinylated derivative to immobilized 
avidin. The second method is to generate antibodies to the 
25 target material, immobilize the anti bodies by any of numerous 
methods, and then bind the target material to the immobilized 
antibodies. Use of antibodies is more appropriate for larger 
target materials; small targets (those comprising, for 
example, ten or fewer non-hydrogen atoms) may be so completely 

3 0 engulfed by an antibody that very little of the target is 

exposed in the target -antibody complex. 
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Non-covalent immobilization of hydrophobic molecules 
without resort to antibodies may also be used. A compound, 
such as 2 , 3 , 3 - trimethyldecane is blended with a matrix 
precursor, such as sodium alginate, and the mixture is 
5 extruded into a hardening solution. The resulting beads will 
have 2,3,3- trimethyldecane dispersed throughout and exposed 
on the surface. 

Other immobilization methods depend on the presence of 
particular chemical functionalities. A polypeptide will 
10 present -NH 2 (N-terminal; Lysines), - COOH (C- terminal ; 

Aspartic Acids; Glutamic Acids), -OH (Serines; Threonines; 
Tyrosines) , and -SH (Cysteines) . A polysaccharide has free - 
OH groups, as does DNA, which has a sugar backbone. 

The following table is a nonexhaustive review of reactive 
15 functional groups and potential immobilization reagents: 



Group Reagent 

R-NH 2 



R-NH 2 



R-NH 2 

guanido 

R-C0 2 H 
R-C0 2 - 
R-OH 
Aryl-OH 



Derivatives of 2,4,6- trinitro 
benzene sulfonates (TNBS) , 
(CREI84, p. 11) 

Carboxylic acid anhydrides, 
e.g. derivatives of succinic 
anhydride, maleic anhydride, 
citraconic anhydride (CREI84, 
p. 11) 

Aldehydes that form reducible 
Schiff bases (CREI84, p. 12) 

cyclohexanedione derivatives 
(CREI84, p. 14) 

Diazo cmpds (CREI84, p. 10) 

Epoxides (CREI84, p. 10) 

Carboxylic acid anhydrides 
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Carboxylic acid anhydrides 

Indole ring 

Benzyl halide and sulfenyl 
halides (CRE184, p. 19) 

R-SH 

N-alkylmaleimides (CREI84 , 

p. 21) 

R-SH 

ethyl eneimine derivatives 
(CREI84, p. 21) 

R-SH 

Aryl mercury compounds, 

(CREI84, P. 21) 

R-SH 

Disulfide reagents, (CREI84, 
p. 23) 

Thiol ethers 

Alkyl iodides, (CREI84, p. 20) 

Ketones 

Make Schiff ! s base and reduce 
with NaBH 4 . (CREI84, p. 12- 13) 

Aldehydes 

Oxidize to COOH, vide supra . 

R-S0 3 H 

Convert to R-S0 2 C1 and react 
with immobilized alcohol or 
amine . 

R-PO3H 

Convert to R-P0 2 C1 and react 
with immobilized alcohol or 
amine . 

CC double bonds 

Add HBr and then make amine or 
thiol . 

The next table identifies the reactive groups of a number of 
potential targets . 



Reactive groups or 

Compound (Item#, page)* [derivatives] 

prostaglandin E2 
(2893, 1251) 

-OH, keto, -COOH, C=C 

aspartame (861, 132) 

-NH 2 , -COOH, -COOCH3 

haem (4558, 732) 



bilirubin (1235, 189) 
morphine (6186, 988) 

codeine (24 59, 3 84) 
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vinyl , - COOH , Fe 

vinyl, -COOH, keto, -NH- 

-OH, -C=C- , reactive phenyl 
ring 

-OH, -C=C-, reactive phenyl 
ring 



dichlorodiphenyltrichlorethane (2832 , 446) 

aromatic chlorine , 
aliphatic chlorine 



benzo (a) pyrene 
(1113, 172) 



actinomycin D 
(2804,441) 

cellulose 

hydroxy 1 apa t i t e 

cholesterol (2204 , 341) 



[Chlorinate- >amine , or make 
sulfonate- > Aryl-S0 2 Cl] 



aryl-NH 2 , -OH 
self immobilized 
self immobilized 
-OH, >C=C- 



*Note: Item# and page refer to The Merck Index, 11th Edition. 

The extensive literature on affinity chromatography and 
related techniques will provide further examples. 
5 Matrices suitable for use as support materials include 

polystyrene, glass, agarose and other chromato graphic 
supports, and may be fabricated into beads, sheets, columns, 
wells, and other forms as desired. Suppliers of support 
material for affinity chromatography include: Applied Protein 
10 Technologies Cambridge, MA; Bio-Rad Laboratories, Rockville 
Center, NY; Pierce Chemical Company, Rockford, IL. Target 
materials are attached to the matrix in accord with the 
directions of the manufacturer of each matrix preparation with 
consideration of good presentation of the target. 



Early in the selection process, relatively high 
concentrations of target materials may be applied to the 
matrix to facilitate binding; target concentrations may 
subsequently be reduced to select for higher affinity SBDs . 
5 V. G. Elution of Lower Affinity PBD-Bearing Genetic Packages 

The population of GPs is applied to an affinity matrix 
under conditions compatible with the intended use of the 
binding protein and the population is fractionated by passage 
of a gradient of some solute over the column. The process 
10 enriches for PBDs having affinity for the target and for which 
the affinity for the target is least affected by the eluants 
used. The enriched fractions are those containing viable GPs 
that elute from the column at greater concentration of the 
eluant . 

15 The eluants preferably are capable of weakening 

noncovalent interactions between the displayed PBDs and the 
immobilized target material. Preferably, the eluants do not 
kill the genetic package; the genetic message corresponding to 
successful mini-proteins is most conveniently amplified by 

2 0 reproducing the genetic package rather than by in vitro 
procedures such as PCR. The list of potential eluants 
includes salts (including Na+, NH 4 +, Rb+, S0 4 --, H 2 P0 4 -, 

citrate, K+, Li+, Cs+, HS0 4 -, C0 3 --, Ca++, Sr++, C1-, P0 4 , 

HCO3-, Mg++, Ba + +, Br-, HPO4-- and acetate), acid, heat, 

2 5 compounds known to bind the target, and soluble target 

material (or analogues thereof) . 

Because bacteria continue to metabolize during affinity 
separation, the choice of buffer components is more restricted 
for bacteria than for bacteriophage or spores. Neutral 

3 0 solutes, such as ethanol, acetone, ether, or urea, are 

frequently used in protein purification and are known to 
weaken non-covalent interactions between proteins and other 
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molecules. Many of these species are, however, very harmful 
to bacteria and bacteriophage. Urea is known not to harm M13 
up to 8 M. Bacterial spores, on the other hand, are 
impervious to most neutral solutes. Several affinity 
5 separation passes may be made within a single round of 
variegation. Different solutes may be used in different 
analyses, salt in one, pH in the next, etc . 

Any ions or cofactors needed for stability of PBDs 
(derived from IPBD) or target are included in initial and 

10 elution buffers at appropriate levels. We first remove 

GP(PBD)s that do not bind the target by washing the matrix 
with the initial buffer. We determine that this phase of 
washing is complete by plating aliquots of the washes or by 
measuring the optical density (at 260 nm or 280 nm) . The 

15 matrix is then eluted with a gradient of increasing: a) salt, 
b) [H+] (decreasing pH) , c) neutral solutes, d) temperature 
(increasing or decreasing) , or e) some combination of these 
factors. The solutes in each of the first three gradients 
have been found generally to weaken non-covalent interactions 

20 between proteins and bound molecules. Salt is a preferred 

solute for gradient formation in most cases. Decreasing pH is 
also a highly preferred eluant . In some cases, the preferred 
matrix is not stable to low pH so that salt and urea are the 
most preferred reagents. Other solutes that generally weaken 

25 non-covalent interaction between proteins and the target 
material of interest may also be used. 

The uneluted genetic packages contain DNA encoding 
binding domains which have a sufficiently high affinity for 
the target material to resist the elution conditions. The DNA 

3 0 encoding such successful binding domains may be recovered in a 
variety of ways. Preferably, the bound genetic packages are 
simply eluted by means of a change in the elution conditions. 
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Alternatively, one may culture the genetic package in situ , or 
extract the target -containing matrix with phenol (or other 
suitable solvent) and amplify the DNA by PCR or by recombinant 
DNA techniques. Additionally, if a site for a specific 
5 protease has been engineered into the display vector, the 
specific protease is used to cleave the binding domain from 
the GP. 

V.H. Optimization of Affinity Chromatography Separation: 
For linear gradients, elution volume and eluant 

10 concentration are directly related. Changes in eluant 

concentration cause GPs to elute from the column. Elution 
volume, however, is more easily measured and specified. It is 
to be understood that the eluant concentration is the agent 
causing GP release and that an eluant concentration can be 

15 calculated from an elution volume and the specified gradient. 

Using a specified elution regime, we compare the elution 
volumes of GP(IPBD)s with the elution volumes of wtGP on 
affinity columns supporting AfM(IPBD) . Com parisons are made 
at various: a) amounts of IPBD/GP, b) densities of 

20 AfM (I PBD) / (volume of matrix) (DoAMoM) , c) initial ionic 
strengths, d) elution rates, e) amounts of GP/ (volume of 
support), f) pHs, and g) temperatures, because these are the 
parameters most likely to affect the sensitivity and 
efficiency of the separation. We then pick those conditions 

25 giving the best separation. 

We do not optimize pH or temperature; rather we record 
optimal values for the other parameters for one or more values 
of pH and temperature. The pH used must be within the range 
of pH for which GP(IPBD) binds the AfM (I PBD) that is being 

30 used in this step. The conditions of intended use specified 
by the user may include a specification of pH or temperature. 
If pH is specified, then pH will not be varied in eluting the 
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column. Decreasing pH may, however, be used to liberate bound 
GPs from the matrix. Similarly, if the intended use specifies 
a temperature, we will hold the affinity column at the 
specified temperature during elution, but we might vary the 
5 temperature during recovery. If the intended use specifies 
the pH or temperature, then we prefer that the affinity 
separation be optimized for all other parameters at the 
specified pH and temperature. 

In the optimization devised in this step, we preferably 

10 use a molecule known to have moderate affinity for the IPBD (Kd 
in the range 10" 6 M to 10" 8 M) , for the following reason. When 
populations of GP (vgPBD) s are fractionated, there will be 
roughly three subpopulations : a) those with no binding, b) 
those that have some binding but can be washed off with high 

15 salt or low pH, and c) those that bind very tightly and are 
most easily rescued in situ . We optimize the parameters to 
separate (a) from (b) rather than (b) from (c) . Let PBD W be a 
PBD having weak binding to the target and PBD S be a PBD having 
strong binding. Higher DoAMoM might, for example, favor 

20 retention of GP(PBD W ) but also make it very difficult to elute 
viable GP(PBD S ). We will optimize the affinity separation to 
retain GP(PBD W ) rather than to allow release of GP(PBD S ) 
because a tightly bound GP (PBD g ) can be rescued by in situ 
growth. If we find that DoAMoM strongly affects the elution 

2 5 volume, then in part III we may reduce the amount of target on 
the affinity column when an SBD has been found with moderately 
strong affinity (Ka on the order of 10" 7 M) for the target. 

In case the promoter of the osp-ipbd gene is not 
regulated by a chemical inducer, we optimize DoAMoM, the 

30 elution rate, and the amount of GP/volume of matrix. If the 
optimized affinity separation is acceptable, we proceed. If 
not, we develop a means to alter the amount of IPBD per GP . 
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Among GPs considered in the present invention, this case could 
arise only for spores because regulatable promoters are 
available for all other systems. 

If the amount of IPBD/spore is too high, we could 
5 engineer an operator site into the osp-ipbd gene. We choose 
the operator sequence such that a repressor sensitive to a 
small diffusible inducer recognizes the operator. 
Alternatively, we could alter the Shine- Dalgarno sequence to 
produce a lower homology with consensus Shine-Dalgarno 
10 sequences. If the amount of IPBD/spore is too low, we can 
introduce variability into the promoter or Shine-Dalgarno 
sequences and screen colonies for higher amounts of 
IPBD/spore . 

In this step, we measure elution volumes of genetically 

15 pure GPs that elute from the affinity matrix as sharp bands 

that can be detected by UV absorption. Alternatively, samples 
from effluent fractions can be plated on suitable medium 
(cells or spores) or on sensitive cells (phage) and colonies 
or plaques counted. 

20 Several values of IPBD/GP, DoAMoM, elution rates, initial 

ionic strengths, and loadings should be examined. The 
following is only one of many ways in which the affinity 
separation could be optimized. We anticipate that optimal 
values of IPBD/GP and DoAMoM will be correlated and therefore 

25 should be optimized together. The effects of initial ionic 
strength, elution rate, and amount of GP/ (matrix volume) are 
unlikely to be strongly correlated, and so they can be 
opt imi zed independent ly . 

For each set of parameters to be tested, the column is 

3 0 eluted in a specified manner. For example, we may use a 

regime called Elution Regime 1: a KCl gradient runs from lOmM 
to maximum allowed for the GP(IPBD) viability in 100 fractions 
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of 0.05 V V/ followed by 20 fractions of 0.05 V v at maximum 
allowed KCl ; pH of the buffer is maintained at the specified 
value with a convenient buffer such as phosphate, Tris, or 
MOPS. Other elution regimes can be used; what is important is 
5 that the conditions of this optimization be similar to the 

conditions that are used in Part III for selection for binding 
to target and recovery of GPs from the chromatographic system. 

When the osp-ipbd gene is regulated by [XINDUCE] , IPBD/GP 
can be controlled by varying [XINDUCE] . Appro priate values 

10 of [XINDUCE] depend on the identity of [XINDUCE] and the 

promoter; if, for example, XINDUCE is isopropylthiogalactoside 
(IPTG) and the promoter is lacUVS , then [IPTG] =0, 0.1 uM, 
1.0 uM, 10.0 uM, 100.0 uM, and 1.0 mM would be appropriate 
levels to test. The range of variation of [XINDUCE] is 

15 extended until an optimum is found or an acceptable level of 
expression is obtained. 

DoAMoM is varied from the maximum that the matrix 
material can bind to 1% or 0.1% of this level in appro priate 
steps. We anticipate that the efficiency of separation will 

2 0 be a smooth function of DoAMoM so that it is appropriate to 

cover a wide range of values for DoAMoM with a coarse grid and 
then explore the neighborhood of the approximate optimum with 
a finer grid. 

Several values of initial ionic strength are tested, such 
25 as 1.0 mM, 5.0 mM, 10.0 mM and 20.0 mM. Low ionic strength 
favors binding between oppositely charged groups, but could 
also cause GP to precipitate. 

The elution rate is varied, by successive factors of 1/2, 
from the maximum attainable rate to 1/16 of this value. If 
30 the lowest elution rate tested gives the best separation, we 
test lower elution rates until we find an optimum or adequate 
separation . 
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The goal of the optimization is to obtain a sharp 
transition between bound and unbound GPs, triggered by 
increasing salt or decreasing pH or a combination of both. 
This optimization need be performed only: a) for each 
5 temperature to be used, b) for each pH to be used, and c) when 
a new GP(IPBD) is created. 

V.I. Measuring the sensitivity of affinity separation: 
Once the values of IPBD/GP, DoAMoM, initial ionic 
strength, elution rate, and amount of GP/ (volume of affinity 

10 support) have been optimized, we determine the sensitivity of 
the affinity separation (C sen si) by the following procedure that 
measures the minimum quantity of GP(IPBD) that can be detected 
in the presence of a large excess of wtGP. The user chooses a 
number of separation cycles, denoted N ch rom/ that will be 

15 performed before an enrichment is abandoned; preferably, N C hrom 
is in the range 6 to 10 and N chr0 Tn must be greater than 4 . 
Enrichment can be terminated by isolation of a desired GP(SBD) 
before N chr om passes. 

The measurement of sensitivity is significantly expedited 

2 0 if GP(IPBD) and wtGP carry different selectable markers 

because such markers allow easy identification of colonies 
obtained by plating fractions obtained from the chromatography 
column. For example, if wtGP carries kanamycin resistance and 
GP(IPBD) carries ampicillin resistance, we can plate fractions 

25 from a column on non-selective media suitable for the GP . 

Transfer of colonies onto ampicillin- or kanamycin- containing 
media will determine the identity of each colony. 

Mixtures of GP(IPBD) and wtGP are prepared in the ratios 
of l:Vii m , where Vii m ranges by an appropriate factor ( e.g. 

30 1/10) over an appropriate range, typically 10 11 through 10 4 . 
Large values of Vn m are tested first; once a positive result 
is obtained for one value of Vn m , no smaller values of Vn m 
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need be tested. Each mixture is applied to a column 
supporting, at the optimal DoAMoM, an AfM(IPBD) having high 
affinity for IPBD and the column is eluted by the specified 
elution regime, such as Elution Regime 1. The last fraction 
5 that contains viable GPs and an inoculum of the column matrix 
material are cultured. If GP(IPBD) and wtGP have different 
selectable markers, then transfer onto selection plates 
identifies each colony. If GP(IPBD) and wtGP have no 
selectable markers or the same selectable markers, then a 

10 number ( e.g. 32) of GP clonal isolates are tested for presence 
of IPBD. If IPBD is not detected on the surface of any of the 
isolated GPs, then GPs are pooled from: a) the last few ( e.g. 
3 to 5) fractions that contain viable GPs, and b) an inoculum 
taken from the column matrix. The pooled GPs are cultured and 

15 passed over the same column and enriched for GP(IPBD) in the 
manner described. This process is repeated until N ch rom passes 
have been performed, or until the IPBD has been detected on 
the GPs. If GP(IPBD) is not detected after N C hrom passes , Vn m is 
decreased and the process is repeated. 

2 0 Once a value for Vn m is found that allows recovery of 

GP(IPBD)s, the factor by which Vi im is varied is reduced and 
additional values are tested until Vi im is known to within a 
factor of two. 

Cgensi equals the highest value of Vii m for which the user 
25 can recover GP(IPBD) within N chrom passes. The number of 
chromatographic cycles (Kc yc ) that were needed to isolate 
GP(IPBD) gives a rough estimate of C e ff; C e ff is approximately 
the Kcycth root of Viim: 

C e ff ~ exp{ log e (V lim ) /Kcyc } 
30 For example, if Vi im were 4.0 x 10 s and three separation 

cycles were needed to isolate GP(IPBD), then C e ff « 736. 
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V.J. Measuring the efficiency of separation : 

To determine C e ff more accurately, we determine the ratio 
of GP (IPBD) /wtGP loaded onto an AfM(IPBD) column that yields 
approximately equal amounts of GP(IPBD) and wtGP after 
5 elution. We prepare mixtures of GP(IPBD) and wtGP in ratios 
GP (IPBD) : wtGP :: 1:Q; we start Q at twenty times the 
approximate C e ff found above. A 1:Q mixture of GP(IPBD) and 
wtGP is applied to a Af M (IPBD) column and eluted by the 
specified elution regime, such as Elution Regime 1. A sample 

10 of the last fraction that contains viable GPs is plated at a 
dilution that gives well separated colonies or plaques. The 
presence of IPBD or the osp-ipbd gene in each colony or plaque 
can be determined by a number of standard methods, including: 
a) use of different selectable markers, b) nitrocellulose 

15 filter lift of GPs and detection with Af M (IPBD) * (AUSU8 7) , or 
c) nitrocellulose filter lift of GPs and detection with 
radiolabeled DNA that is complementary to the osp-ipbd gene 
(AUSU87) . Let F be the fraction of GP(IPBD) colonies found in 
the last fraction containing viable GPs. When a Q is found 

2 0 such that .20 < F < .80, then 
Ceff = Q * F. 

If F < 0.2, then we reduce Q by an appropriate factor 
( e.g. 1/10) and repeat the procedure. If F > 0.8, then we 
increase Q by an appropriate factor ( e.g. 2) and repeat the 

2 5 procedure. 

V.K. Reducing selection due to non-specific binding: 

When affinity chromatography is used for separating bound 
and unbound GPs, we may reduce non- specific binding of 
GP(PBD)s to the matrix that bears the target in the following 

3 0 ways : 
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1) we treat the column with blocking agents such as 
genetically defective GPs or a solution of protein before 
the population of GP (vgPBD) s is chromatographed, and 

2) we pass the population of GP(vgPBD)s over a matrix 

5 containing no target or a different target from the same 

class as the actual target prior to affinity 
chromatography . 
Step (1) above saturates any non-specific binding that the 
affinity matrix might show toward wild- type GPs or proteins in 

10 general; step (2) removes components of our population that 

exhibit non-specific binding to the matrix or to molecules of 
the same class as the target. If the target were horse heart 
myoglobin, for example, a column supporting bovine serum 
albumin could be used to trap GPs exhibiting PBDs with strong 

15 non-specific binding to proteins. If cholesterol were the 
target, then a hydrophobic compound, such as p- 
tertiarybutylbenzyl alcohol, could be used to remove GPs 
displaying PBDs having strong non-specific binding to 
hydrophobic compounds. It is anticipated that PBDs that fail 

20 to fold or that are prematurely terminated will be non- 

specifically sticky. These sequences could outnumber the PBDs 
having desirable binding properties. Thus, the capacity of 
the initial column that removes indiscriminately adhesive PBDs 
should be greater ( e.g. 5 fold greater) than the column that 

25 supports the target molecule. 

Variation in the support material (polystyrene, glass, 
agarose, cellulose, etc . ) in analysis of clones carrying SBDs 
is used to eliminate enrichment for packages that bind to the 
support material rather than the target . 

3 0 FACs may be used to separate GPs that bind fluorescent 

labeled target. We discriminate against artif actual binding 
to the fluorescent label by using two or more different dyes, 
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chosen to be structurally different. GPs isolated using 
target labeled with a first dye are cultured. These GPs are 
then tested with target labeled with a second dye. 

Electrophoretic affinity separation uses unaltered target 
5 so that only other ions in the buffer can give rise to 

artif actual binding. Artif actual binding to the gel material 
gives rise to retardation independent of field direction and 
so is easily eliminated. 

A variegated population of GPs will have a variety of 

10 charges. The following 2D electrophoretic procedure 

accommodates this variation in the population. First the 
variegated population of GPs is electrophoresed in a gel that 
contains no target material. The electrophoresis continues 
until the GP s are distributed along the length of the lane. 

15 The gels described by Sewer for phage are very low in agarose 
and lack mechanical stability. The target -free lane in which 
the initial electrophoresis is conducted is separate from a 
square of gel that contains target material by a removable 
baffle. After the first pass, the baffle is removed and a 

2 0 second electrophoresis is conducted at right angles to the 

first. GPs that do not bind target migrate with unaltered 
mobility while GP s that do bind target will separate from the 
majority that do not bind target. A diagonal line of non- 
binding GPs will form. This line is excised and discarded. 
25 Other parts of the gel are dissolved and the GPs cultured. 

V.L. Isolation of GP(PBD)s with binding- to- target phenotypes : 

The harvested packages are now enriched for the binding- 
to-target phenotype by use of affinity separation involving 
the target material immobilized on an affinity matrix. 

3 0 Packages that fail to bind to the target material are washed 

away. If the packages are bacteriophage or endospores, it may 
be desirable to include a bacteriocidal agent, such as azide, 
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in the buffer to prevent bacterial growth. The buffers used 
in chromatography include: a) any ions or other solutes needed 
to stabilize the target, and b) any ions or other solutes 
needed to stabilize the PBDs derived from the IPBD. 
V.M. Recovery of packages: 

Recovery of packages that display binding to an affinity 
column may be achieved in several ways, including: 

1) collect fractions eluted from the column with a gradient 
as described above; fractions eluting later in the 
gradient contain GPs more enriched for genes encoding 
PBDs with high affinity for the column, 

2) elute the column with the target material in soluble 
form, 

3) flood the matrix with a nutritive medium and grow the 
desired packages in situ , 

4) remove parts of the matrix and use them to inoculate 
growth medium, 

5) chemically or enzymat ically degrade the linkage holding 
the target to the matrix so that GPs still bound to 
target are eluted, or 

6) degrade the packages and recover DNA with phenol or other 
suitable solvent; the recovered DNA is used to transform 
cells that regenerate GPs. 

It is possible to utilize combinations of these methods. It 
should be remembered that what we want to recover from the 
affinity matrix is not the GPs per se , but the information in 
them. Recovery of viable GPs is very strongly preferred, but 
recovery of genetic material is essential. If cells, spores, 
or virions bind irreversibly to the matrix but are not killed, 
we can recover the information through in situ cell division, 
germination, or infection respectively. Proteolytic 
degradation of the packages and recovery of DNA is not 
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preferred. 

Although degradation of the bound GPs and recovery of 
genetic material is a possible mode of operation, inadvertent 
inactivation of the GPs is very deleter ious . It is preferred 
5 that maximum limits for solutes that do not inactivate the GPs 
or denature the target or the column are determined. If the 
affinity matrices are expendable, one may use conditions that 
denature the column to elute GPs; before the target is 
denatured, a portion of the affinity matrix should be removed 

10 for possible use as an inoculum. As the GPs are held together 
by protein-protein interactions and other non-covalent 
molecular interactions, there will be cases in which the 
molecular package will bind so tightly to the target molecules 
on the affinity matrix that the GPs can not be washed off in 

15 viable form. This will only occur when very tight binding has 
been obtained. In these cases, methods (3) through (5) above 
can be used to obtain the bound packages or the genetic 
messages from the affinity matrix. 

It is possible, by manipulation of the elution 

2 0 conditions, to isolate SBDs that bind to the target at one pH 
(pH b ) but not at another pH (pH Q ) . The population is applied 
at pH b and the column is washed thoroughly at pH b . The column 
is then eluted with buffer at pH G and GPs that come off at the 
new pH are collected and cultured. Similar procedures may be 

2 5 used for other solution parameters, such as temperature. For 

example, GP (vgPBD) s could be applied to a column supporting 
insulin. After eluting with salt to remove GPs with little or 
no binding to insulin, we elute with salt and glucose to 
liberate GPs that display PBDs that bind insulin or glucose in 

3 0 a competitive manner. 
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V.N. Amplifying the Enriched Packages 

Viable GPs having the selected binding trait are 
amplified by culture in a suitable medium, or, in the case of 
phage, infection into a host so cultivated. If the GPs have 
5 been inactivated by the chromatography, the OCV carrying the 
osp - pbd gene are recovered from the GP, and introduced into a 
new, viable host. 

V.O. Determining whether further enrichment is needed: 

The probability of isolating a GP with improved binding 

10 increases by C e ff with each separation cycle. Let N be the 
number of distinct amino-acid sequences produced by the 
variegation. We want to perform K separation cycles before 
attempting to isolate an SBD, where K is such that the 
probability of isolating a single SBD is 0.10 or higher. 

15 K = the smallest integer>= logi 0 (0.10 N) /logi 0 (C e ff ) 

For example, if N were 1.0-10 7 and C e ff = 6.31- 10 2 , then 
logi 0 (1.0-10 6 )/log 10 (6.31-10 2 ) = 6.0000/2.8000 = 2.14. 
Therefore we would attempt to isolate SBDs after the third 
separation cycle. After only two separation cycles, the 

20 probability of finding an SBD is 

(6.31 x 102)2/(1.0 x 107) = .04 
and attempting to isolate SBDs might be profitable. 

Clonal isolates from the last fraction eluted which 
contained any viable GPs, as well as clonal isolates obtained 

25 by culturing an inoculum taken from the affinity matrix, are 
cultured in a growth step that is similar to that described 
previously. Other fractions may be cultured too. If K 
separation cycles have been completed, samples from a number, 
e.g. 32, of these clonal isolates are tested for elution 

30 properties on the {target} column. If none of the isolated, 

genetically pure GPs show improved binding to target, or if K 
cycles have not yet been completed, then we pool and culture, 



191 

in a manner similar to the manner set forth previously, the 
GPs from the last few fractions eluted that contained viable 
GPs and from the GPs obtained by culturing an inoculum taken 
from the column matrix. We then repeat the enrichment 
5 procedure described above. This cyclic enrichment may 
continue N chr om passes or until an SBD is isolated . 

If one or more of the isolated GPs has improved retention 
on the {target} column, we determine whether the retention of 
the candidate SBDs is due to affinity for the target material 

10 as follows. A second column is prepared using a different 
support matrix <image> 

</image>material bound at the optimal density. The 
elution volumes, under the same elution conditions as used 
previously, of candidate GP(SBD)s are compared to each other 

15 and to GP(PPBD of this round) . If one or more candidate 
GP(SBD)s has a larger elution volume than GP(PPBD of this 
round), then we pick the GP(SBD) having the highest elution 
volume and proceed to characterize the population. If none of 
the candidate GP(SBD)s has higher elution volume than GP(PPBD 

2 0 of this round) , then we pool and culture, in a manner similar 

to the manner used previously, the GPs from the last few 
fractions that contained viable GPs and the GPs obtained by 
culturing an inoculum taken from the column matrix. We then 
repeat the enrichment procedure. 
25 If all of the SBDs show binding that is superior to PPBD 

of this round, we pool and culture the GPs from the last 
fraction that contains viable GPs and from the inoculum taken 
from the column. This population is re- chromatographed at 
least one pass to fractionate further the GPs based on Ka. 

3 0 If an RNA phage were used as GP, the RNA would either be 

cultured with the assistance of a helper phage or be reverse 
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transcribed and the DNA amplified. The amplified DNA could 
then be sequenced or subcloned into suitable plasmids. 
V.P. Characterizing the Putative SBDs : 

We characterize members of the population showing desired 
5 binding properties by genetic and biochemical methods. We 
obtain clonal isolates and test these strains by genetic and 
affinity methods to determine genotype and phenotype with 
respect to binding to target. For several genetically pure 
isolates that show binding, we demonstrate that the binding is 

10 caused by the artificial chimeric gene by excising the osp-sbd 
gene and crossing it into the parental GP. We also ligate the 
deleted backbone of each GP from which the osp-sbd is removed 
and demonstrate that each backbone alone cannot confer binding 
to the target on the GP. We sequence the osp-sbd gene from 

15 several clonal isolates. Primers for sequencing are chosen 
from the DNA flanking the osp-ppbd gene or from parts of the 
osp-ppbd gene that are not variegated. 

The present invention is not limited to a single method 
of determining protein sequences, and reference in the 

2 0 appended claims to determining the amino acid sequence of a 
domain is intended to include any practical method or 
combination of methods, whether direct or indirect. The 
preferred method, in most cases, is to determine the sequence 
of the DNA that encodes the protein and then to infer the 

2 5 amino acid sequence. In some cases, standard methods of 

protein- sequence determination may be needed to detect post- 
translational processing . 

The present invention is not limited to a single method 
of determining the sequence of nucleotides (nts) in DNA 

3 0 subsequences. In the preferred embodiment, plasmids are 

isolated and denatured in the presence of a sequencing primer, 
about 2 0 nts long, that anneals to a region adjacent, on the 
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5' side, to the region of interest. This plasmid is then used 
as the template in the four sequencing reactions with one 
dideoxy substrate in each. Sequencing reactions, agarose gel 
electrophoresis, and polyacrylamide gel electrophoresis (PAGE) 
5 are performed by standard procedures (AUSU87) . 

For one or more clonal isolates, we may subclone the sbd 
gene fragment, without the osp fragment, into an expression 
vector such that each SBD can be produced as a free protein. 
Because numerous unique restriction sites were built into the 

10 inserted domain, it is easy to subclone the gene at any time. 
Each SBD protein is purified by normal means, including 
affinity chromato graphy. Physical measurements of the 
strength of binding are then made on each free SBD protein by 
one of the following methods: 1) alteration of the Stokes 

15 radius as a function of binding of the target material, 

measured by characteristics of elution from a molecular sizing 
column such as agarose, 2) retention of radiolabeled binding 
protein on a spun affinity column to which has been affixed 
the target material, or 3) retention of radiolabeled target 

20 material on a spun affinity column to which has been affixed 
the binding protein. The measurements of binding for each 
free SBD are compared to the corresponding measurements of 
binding for the PPBD. 

In each assay, we measure the extent of binding as a 

25 function of concentration of each protein, and other relevant 
physical and chemical parameters such as salt concentration, 
temperature, pH, and prosthetic group concentrations (if any) . 

In addition, the SBD with highest affinity for the target 
from each round is compared to the best SBD of the previous 

3 0 round (IPBD for the first round) and to the IPBD (second and 
later rounds) with respect to affinity for the target 
material. Successive rounds of mutagenesis and selection- 
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through-binding yield increasing affinity until desired levels 
are achieved. 

If we find that the binding is not yet sufficient, we 
decide which residues to vary next. If the binding is 
5 sufficient, then we now have a expression vector bearing a 
gene encoding the desired novel binding protein. 
V.Q. Joint selections: 

One may modify the affinity separation of the method 
described to select a molecule that binds to material A but 

10 not to material B. One needs to prepare two selection 

columns, one with material A and the other with material B. 
The population of genetic packages is prepared in the manner 
described, but before applying the population to A, one passes 
the population over the B column so as to remove those members 

15 of the population that have high affinity for B ("reverse 

affinity chromatography"). In the preceding specification, 
the initial column supported some other molecule simply to 
remove GP(PBD)s that displayed PBDs having indiscriminate 
affinity for surfaces. 

20 It may be necessary to amplify the population that does 

not bind to B before passing it over A. Amplifi cation would 
most likely be needed if A and B were in some ways similar and 
the PPBD has been selected for having affinity for A. The 
optimum order of interac tions might be determined 

25 empirically. For example, to obtain an SBD that binds A but 

not B, three columns could be connected in series: a) a column 
supporting some compound, neither A nor B, or only the matrix 
material, b) a column supporting B, and c) a column supporting 
A. A population of GP (vgPBD) s is applied to the series of 

3 0 columns and the columns are washed with the buffer of constant 
ionic strength that is used in the application. The columns 
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are uncoupled, and the third column is eluted with a gradient 
to isolate GP(PBD)s that bind A but not B. 

One can also generate molecules that bind to both A and 
B . In this case we can use a 3D model and mutate one face of 
5 the molecule in question to get binding to A. One can then 
mutate a different face to produce binding to B. When an SBD 
binds at least somewhat to both A and B, one can mutate the 
chain by Diffuse Mutagenesis to refine the binding and use a 
sequential joint selection for binding to both A and B. 

10 The materials A and B could be proteins that differ at 

only one or a few residues. For example, A could be a natural 
protein for which the gene has been cloned and B could be a 
mutant of A that retains the overall 3D structure of A. SBDs 
selected to bind A but not B probably bind to A near the 

15 residues that are mutated in B. If the mutations were picked 
to be in the active site of A (assuming A has an active site) , 
then an SBD that binds A but not B will bind to the active 
site of A and is likely to be an inhibitor of A. 

To obtain a protein that will bind to both A and B, we 

2 0 can, alternatively, first obtain an SBD that binds A and a 
different SBD that binds B. We can then combine the genes 
encoding these domains so that a two- domain single- 
polypeptide protein is produced. The fusion protein will have 
affinity for both A and B because one of its domains binds A 

2 5 and the other binds B. 

One can also generate binding proteins with affinity for 
both A and B, such that these materials will compete for the 
same site on the binding protein. We guarantee competition by 
overlapping the sites for A and B. Using the procedures of 

3 0 the present invention, we first create a molecule that binds 

to target material A. We then vary a set of residues defined 
as : a) those residues that were varied to obtain binding to A, 
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plus b) those residues close in 3D space to the residues of 
set (a) but that are internal and so are unlikely to bind 
directly to either A or B . Residues in set (b) are likely to 
make small changes in the positioning of the residues in set 
5 (a) such that the affinities for A and B will be changed by 
small amounts. Members of these populations are selected for 
affinity to both A and B. 
V.R. Selection for non-binding: 

The method of the present invention can be used to select 

10 proteins that do not bind to selected targets. Consider a 

protein of pharmacological importance, such as streptokinase, 
that is antigenic to an undesirable extent. We can take the 
pharmacologically important protein as IPBD and antibodies 
against it as target. Residues on the surface of the 

15 pharmacologically important protein would be variegated and 
GP(PBD)s that do not bind to an antibody column would be 
collected and cultured. Surface residues may be identified in 
several ways, including: a) from a 3D structure, b) from 
hydrophobicity considerations, or c) chemical labeling. The 

2 0 3D structure of the pharmacologically important protein 

remains the preferred guide to picking residues to vary, 
except now we pick residues that are widely spaced so that we 
leave as little as possible of the original surface unaltered. 
Destroying binding frequently requires only that a single 
25 amino acid in the binding interface be changed. If polyclonal 
antibodies are used, we face the problem that all or most of 
the strong epitopes must be altered in a single molecule. 
Preferably, one would have a set of monoclonal antibodies, or 
a narrow range of antibody species. If we had a series of 

3 0 monoclonal antibody columns, we could obtain one or more 

mutations that abolish binding to each monoclonal antibody. 
We could then combine some or all of these mutations in one 
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molecule to produce a pharmacologi cally important protein 
recognized by none of the monoclonal antibodies. Such mutants 
are tested to verify that the pharmacologically interesting 
proper ties have not be altered to an unacceptable degree by 
5 the mutations. 

Typically, polyclonal antibodies display a range of 
binding constants for antigen. Even if we have only 
polyclonal antibodies that bind to the pharmacologically 
important protein, we may proceed as follows. We engineer the 

10 pharmacologically important protein to appear on the surface 

of a replicable GP . We introduce mutations into residues that 
are on the surface of the pharmacologically important protein 
or into residues thought to be on the surface of the 
pharmacologically important protein so that a population of 

15 GPs is obtained. Polyclonal antibodies are attached to a 

column and the population of GPs is applied to the column at 
low salt. The column is eluted with a salt gradient. The GPs 
that elute at the lowest concentration of salt are those which 
bear pharmacologically important proteins that have been 

2 0 mutated in a way that eliminates binding to the antibodies 
having maximum affinity for the pharmacologically important 
protein. The GPs eluting at the lowest salt are isolated and 
cultured. The isolated SBD becomes the PPBD to further rounds 
of variegation so that the antigenic determinants are 

25 successively eliminated. 

V.S. Selection of PBDs for retention of structure: 

Let us take an SBD with known affinity for a target as 
PPBD to a variegation of a region of the PBD that is far from 
the residues that were varied to create the SBD. We can use 

30 the target as an affinity molecule to select the PBDs that 

retain binding for the target, and that presumably retain the 
underlying structure of the IPBD. The variegations in this 
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case could include insertions and deletions that are likely to 
disrupt the IPBD structure. We could also use the IPBD and 
AfM(IPBD) in the same way. 

For example, if IPBD were BPTI and AfM(BPTI) were 
5 trypsin, we could introduce four or five additional residue 
after residue 26 and select GPs that display PBDs having 
specific affinity for AfM(BPTI). Residue 26 is chosen because 
it is in a turn and because it is about 25 A from K15, a key 
amino acid in binding to trypsin. 

10 The underlying structure is most likely to be retained if 

insertions or deletions are made at loops or turns. 
V.T. Engineering of Antagonists 

It may be desirable to provide an antagonist to an enzyme 
or receptor. This may be achieved by making a molecule that 

15 prevents the natural substrate or agonist from reaching the 

active site. Molecules that bind directly to the active site 
may be either agonists or antagonists. Thus we adopt the 
following strategy. We consider enzymes and receptors 
together under the designation TER (Target Enzyme or 

20 Receptor) . 

For most TERs, there exist chemical inhibitors that block 
the active site. Usually, these chemicals are useful only as 
research tools due to highly toxicity. We make two affinity 
matrices: one with active TER and one with blocked TER. We 

25 make a variegated population of GP(PBD)s and select for SBPs 
that bind to both forms of the enzyme, thereby obtaining SDPs 
that do not bind to the active site. We expect that SBDs will 
be found that bind different places on the enzyme surface. 
Pairs of the sbd genes are fused with an intervening peptide 

30 segment. For example, if SBD-1 and SBD-2 are binding domains 
that show high affinity for the target enzyme and for which 
the binding is non-competitive, then the gene sbd- 
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1 : : 1 inker : : sbd- 2 encodes a two-domain protein that will show 
high affinity for the target . We make several fusions having 
a variety of SBDs and various linkers. Such compounds have a 
reasonable probability of being an antagonist to the target 
5 enzyme . 

VI. EXPLOITATION OF SUCCESSFUL BINDING DOMAINS AND 
CORRESPONDING DNAS 

VI .A. Generally 

Using the method of the present invention, we can obtain 

10 a replicable genetic package that displays a novel protein 
domain having high affinity and specifi city for a target 
material of interest. Such a package carries both amino-acid 
embodiments of the binding protein domain and a DNA embodiment 
of the gene encoding the novel binding domain. The presence 

15 of the DNA facilitates expression of a protein comprising the 
novel binding protein domain within a high-level expression 
system, which need not be the same system used during the 
developmental process . 

VI . B . Production of Novel Binding Proteins 

20 We can proceed to production of the novel binding protein 

in several ways, including: a) altering of the gene encoding 
the binding domain so that the binding domain is expressed as 
a soluble protein, not attached to a genetic package (either 
by deleting codons 5 1 of those encoding the binding domain or 
25 by inserting stop codons 3 ' of those encoding the binding 

domain) , b) moving the DNA encoding the binding domain into a 
known expression system, and c) utilizing the genetic package 
as a purification system. (If the domain is small enough, it 
may be feasible to prepare it by conventional peptide 
30 synthesis methods.) 

Option (c) may be illustrated as follows. Assume that a 
novel BPTI derivative has been obtained by selection of M13 



200 

derivatives in which a population of BPTI -derived domains are 
displayed as fusions to mature coat protein. Assume that a 
specific protease cleavage site ( e.g. that of activated 
clotting factor X) is engineered into the amino-acid sequence 
5 between the carboxy terminus of the BPTI -derived domain and 
the mature coat domain. Furthermore, we alter the display 
system to maximize the number of fusion proteins displayed on 
each phage. The desired phage can be produced and purified, 
for example by centrif ugation, so that no bacterial products 
10 remain. Treatment of the purified phage with a catalytic 

amount of factor X cleaves the binding domains from the phage 
particles. A second centrif ugation step separates the cleaved 
protein from the phage, leaving a very pure protein 
preparation . 

15 VI . C . Mini-Protein Production 

As previously mentioned, an advantage inhering from the 
use of a mini-protein as an IPBD is that it is likely that the 
derived SBD will also behave like a mini -protein and will be 
obtainable by means of chemical synthesis. (The term 
20 "chemical synthesis", as used herein, includes the use of 
enzymatic agents in a cell-free environment.) 

It is also to be understood that mini -proteins obtained 
by the method of the present invention may be taken as lead 
compounds for a series of homologues that contain non- 
25 naturally occurring amino acids and groups other than amino 
acids. For example, one could synthesize a series of 
homologues in which each member of the series has one amino 
acid replaced by its D enantiomer. One could also make 
homologues containing constituents such as £ alanine, 
30 aminobutyric acid, 3- hydroxyproline , 2 -Aminoadipic acid, N- 
ethylasperagine , norvaline, etc . ; these would be tested for 
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binding and other properties of interest, such as stability 
and toxicity. 

Peptides may be chemically synthesized either in solution 
or on supports. Various combinations of stepwise synthesis 
5 and fragment condensation may be employed . 

During synthesis, the amino acid side chains are 
protected to prevent branching. Several different protective 
groups are useful for the protection of the thiol groups of 
cysteines : 

10 1) 4-methoxybenzyl (MBzl; Mob)(NISH82; ZAFA88), removable 

with HF; 

2) acetamidomethyl (Acm) (NISH82 ; NISH86; BECK89c) , removable 
with iodine; mercury ions ( e.g. , mercuric acetate) ; 
silver nitrate; and 
15 3) S-para-methoxybenzyl (HOUG84) . 

Other thiol protective groups may be found in standard 
reference works such as Greene, PROTECTIVE GROUPS IN ORGANIC 
SYNTHESIS (1981) . 

Once the polypeptide chain has been synthesized, 
20 disulfide bonds must be formed. Possible oxidizing agents 

include air (HOUG84; NISH86) , ferricyanide (NISH82; HOUG84), 
iodine (NISH82) , and performic acid (HOUG84) . Temperature, 
pH, solvent, and chaotropic chemicals may affect the course of 
the oxidation. 

25 A large number of mini-proteins with a plurality of 

disulfide bonds have been chemically synthesized in 
biologically active form: conotoxin Gl (13AA, 4 Cys) (NISH82) ; 
heat-stable enterotoxin ST (18AA, 6 Cys) (HOUG84) ; analogues 
of ST (BHAT86) ; Q-conotoxin GVIA (27AA, 6Cys) (NISH86; 

30 RIVI87b) ; Q-conotoxin MVIIA (27 AA, 6 Cys) (OLIV87b) ; a- 

conotoxin SI (13 AA, 4 Cys) (ZAFA88) ; /i-conotoxin Ilia (22AA, 
6 Cys) (BECK8 9c , CRUZ 8 9 , HATA90) . Sometimes, the polypeptide 
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naturally folds so that the correct disulfide bonds are 
formed. Other times, it must be helped along by use of a 
differently removable protective group for each pair of 
cysteines . 

5 VI .D. Uses of Novel Binding Proteins 

The successful binding domains of the present invention 
may, alone or as part of a larger protein, be used for any 
purpose for which binding proteins are suited, including 
isolation or detection of target materials. In furtherance of 
10 this purpose, the novel binding proteins may be coupled 

directly or indirectly, covalently or noncovalently , to a 
label, carrier or support . 

When used as a pharmaceutical, the novel binding proteins 
may be contained with suitable carriers or adjuvanants. 
15 ***** 

All references cited anywhere in this specification are 
incorporated by reference to the extent which they may be 
pertinent . 
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EXAMPLE I 

DISPLAY OF BPTI AS A FUSION TO M13 GENE VIII PROTEIN: 

Example I involves display of BPTI on M13 as a fusion to 
the mature gene VIII coat protein. Each of the DNA 
5 constructions was confirmed by restriction digestion analysis 
and DNA sequencing. 

1. Construction of the viii-signal- sequence : :bpti :: mature - 

viii-coat-protein Display Vector. 

A. Operative cloning vectors (OCV) . 

10 The operative cloning vectors are M13 and phage mids 

derived from M13 or f 1 . The initial construction was in the 
fl-based phagemid pGEM- 3Zf ( - ) (TM) (Promega Corp., Madison, WI . ) . 

A gene comprising, in order, : i) a modified lacUVS 
promoter, ii) a Shine -Dalgarno sequence, iii) DNA encoding the 

15 M13 gene VIII signal sequence, iv) a sequence encoding mature 
BPTI, v) a sequence encoding the mature-M13 -gene- VIII coat 
protein, vi) multiple stop codons, and vii) a transcription 
terminator, was constructed. This gene is illustrated in 
Tables 101- 105; each table shows the same DNA sequence with 

20 different features annotated. There are a number of 

differences between this gene and the one proposed in the 
hypothetical example in the generic specification of the 
parent application. Because the actual construction was made 
in pGEM-3Zf ( - ) , the ends of the synthetic DNA were made 

25 compatible with Sai l and BamHI . The lacO operator of lacUVS 
was changed to the symmet rical lacO with the intention of 
achieving tighter repression in the absence of IPTG. Several 
silent codon changes were made so that the longest segment 
that is identical to wild-type gene VIII is minimized so that 

30 genetic recombination with the co-existing gene VIII is 
unlikely. 
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i) OCV based upon pGEM-3Zf . 

pGEM-3Zf {TM) (Promega Corp. # Madison, WI . ) is a 
plasmid-based vector containing the amp gene, bacterial origin 
of replication, bacteriophage fl origin of replication, a lacZ 
5 operon containing a multiple cloning site sequence, and the T7 
and SP6 polymerase binding sequences. 

Two restriction enzyme recognition sites were introduced, 
by site-directed oligonucleotide mutagenesis, at the 
boundaries of the lacZ operon. This allowed for the removal 

10 of the lacZ operon and its replacement with the synthetic 

gene. A Bam HI recognition site (GGATCC) was introduced at the 
5 1 end of the lacZ operon by the mutation of bases C 33 i and T 332 
to G and A respectively (numbering of Promega) . A Sai l 
recognition site (GTCGAC) was introduced at the 3 ' end of the 

15 operon by the mutation of bases C 30 2i and T 30 23 to G and C 

respectively. A construct combining these variants of pGEM- 
3Zf was designated pGEM-MB3/4. 

ii) OCV based upon M13mpl8. 

M13mpl8 (YANI85) is an M13 bacteriophage-based vector 

2 0 (available from, inter alia , New England Biolabs, Beverly, 

MA.) consisting of the whole of the phage genome into which 
has been inserted a lacZ operon containing a multiple cloning 
site sequence (MESS77) . Two restriction enzyme sites were 
introduced into M13mpl8 using standard methods. A Bam HI 
25 recognition site (GGATCC) was introduced at the 5 1 end of the 
lacZ operon by the mutation of bases C 6 oo3 and G 6 oo4 to A and T 
respectively (numbering of Messing) . This mutation also 
destroyed a unique Nar l site. A Sai l recognition site 
(GTCGAC) was introduced at the 3 1 end of the operon by the 

3 0 mutation of bases A 6430 and C 6 432 to C and A respectively. A 

construct combining these variants of M13mpl8 was designated 
M13-MB1/2 . 
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B) Synthetic Gene • 

A synthetic gene ( VIII -signal -sequence : : mature - 
bpti : : mature -VI I I -coat -protein ) was constructed from 16 
synthetic oligonucleotides (Table 105) , custom synthesized by 
5 Genetic Designs Inc. of Houston, Texas, using methods detailed 
in KIMH89 and ASHM89. Table 101 shows the DNA sequence; Table 
102 contains an annotated version of this sequence. Table 103 
shows the overlaps of the synthetic oligonucleotides in 
relationship to the restriction sites and coding sequence. 

10 Table 104 shows the synthetic DNA in double- stranded form. 

Table 105 shows each of the 16 synthetic oligonucleotides from 
5 , -to-3'. The oligonucleotides were phosphorylated, with the 
exception of the 5 1 most molecules, using standard methods, 
annealed and ligated in stages such that a final synthetic 

15 duplex was generated. The overhanging ends of this duplex was 
filled in with T4 DNA polymerase and it was cloned into the 
Hin di site of pGEM-3Zf(-); the initial construct is called 
pGEM-MBl (Table 101a) . Double -stranded DNA of pGEM-MBl was 
cut with Pst I , filled in with T4 DNA polymerase and ligated to 

2 0 a Sai l linker (New England BioLabs) so that the synthetic gene 

is bounded by BamHI and Sai l sites (Table 101b and Table 
102b) . The synthetic gene was obtained on a BamH I - Sai l 
cassette and cloned into pGEM-MB3/4 and M13-MB1/2 utilizing 
the Bam HI and Sai l sites previously introduced, to generate 
25 the constructs designated pGEM-MB16 and M13-MB15, 

respectively. The full length of the synthetic insert was 
sequenced and found to be unambiguously correct except for: 1) 
a missing G in the Shine-Dalgarno sequence; and 2) a few 
silent errors in the third bases of some codons (shown as 

3 0 upper case in Table 101) . Table 102 shows the Ribosome- 

binding site A104GGAGG but the actual sequence is Ai 04 GAGG. 
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Efforts to express protein from this construction, ±n vivo and 

in vitro , were unavailing. 

C) Alterations to the synthetic gene • 

i) Ribosome binding site (RBS) . 

5 Starting with the construct pGEM-MB16 , a fragment of DNA 

bounded by the restriction enzyme sites SacI and Nhel 
(containing the original RBS) was replaced with a synthetic 
oligonucleotide duplex (with compatible Sac I and Nhe l 
overhangs) containing the sequence for a new RBS that is very 
10 similar to the RBS of coli phoA and that has been shown to 
be functional . 

Original putative RBS (5'-to-3') (SEQ ID NO: 137) 

GAGCTCagaggCTTACTATGAAGAAATCTCTGGTTCTTAAGGCTAGC 
| SacI | 1 Nhe I | 

15 

New RBS (S'-to-S 1 ) (SEQ ID NO: 138) 

GAGCTCTggaggaAATAAAATGAAGAAATCTCTGGTTCTTAAGGCTAGC 
| SacI | 1 Nhe I [ 

20 

The putative RBSs above are lower case and the initiating 
methionine codon is underscored and bold. The resulting 
construct was designated pGEM-MB2 0 . In vitro expression of 
the gene carried by pGEM-MB2 0 produced a novel protein species 
2 5 of the expected size, about 14.5 kd. 

ii) tac promoter. 

In order to obtain higher expression levels of the fusion 
protein, the lacUVS promoter was changed to a tac promoter. 
Starting with the construct pGEM-MB16, which contains the 

30 lacUVS promoter, a fragment of DNA bounded by the restriction 
enzyme sites BamHI and Hpa ll was excised and replaced with a 
compatible synthetic oligonucleotide duplex containing the -35 
sequence of the trp promoter, Cf RUSS82 . This converted the 
lacUVS promoter to a tac promoter in a construct designated 

35 pGEM-MB22, Table 112. 
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MB16 (SEQ ID NO: 139) 
(SEQ ID NO:140) 

5'- GATCC tctagagtcggc TTTACA ctttatgcttc (cg-gctcg . . - 3 ' 
3'- G agatctcagccg aaatgt gaaatacgaag gc (cgagc . . -5 1 

J L I -35[ J L 

BamH I Hpa ll 

MB22 insert (SEQ ID NO: 141) 
(SEQ ID NO: 142) 



5 1 - GATCC actccccatccccctg TTGACA attaatcat -3 1 

15 3 ! - G tgaggggtagggggac AACTGT taattagtagc-5 1 

J L I -35 | 1 

BamH I ( Hpa ll) 



2 0 Promoter and RBS variants of the fusion protein gene were 

constructed by basic DNA manipulation techniques to generate 
the following: 

Promoter RBS Encoded Protein. 

pGEM-MB16 lac old VIIIs . p . -BPTI -matureVIII 

pGEM-MB2 0 lac new 1 1 

pGEM-MB22 tac old 1 ' 

pGEM-MB2 6 tac new 1 1 

The synthetic gene from variants pGEM-MB2 0 and pGEM-MB2 6 
25 were recloned into the altered phage vector M13-MB1/2 to 

generate the phage constructs designated M13-MB27 and M13-MB28 
respectively . 

iii. Signal Peptide Sequence. 

In vitro expression of the synthetic gene regulated by 
30 tac and the "new" RBS produced a novel protein of the expected 
size for the unprocessed protein (about 16 kd) . In vivo 
expression also produced novel protein of full size; no 
processed protein could be seen on phage or in cell extracts 
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by silver staining or by Western analysis with anti-BPTI 
antibody. 

Thus we analyzed the signal sequence of the fusion. 
Table 106 shows a number of typical signal sequences. Charged 
5 residues are generally thought to be of great importance and 
are shown bold and underscored . Each signal sequence contains 
a long stretch of uncharged residues that are mostly 
hydrophobic; these are shown in lower case. At the right, in 
parentheses, is the length of the stretch of uncharged 

10 residues. We note that the fusions of gene VIII signal to 

BPTI and gene III signal to BPTI have rather short uncharged 
segments. These short uncharged segments may reduce or 
prevent processing of the fusion peptides. We know that the 
gene III signal sequence is capable of directing: a) insertion 

15 of the peptide comprising (mature-BPTI) : : (mature-gene-III- 

protein) into the lipid bilayer, and b) translocation of BPTI 
and most of the mature gene III protein across the lipid 
bilayer ( vide infra ) . That the gene III remains anchored in 
the lipid bilayer until the phage is assembled is directed by 

2 0 the uncharged anchor region near the carboxy terminus of the 

mature gene III protein (see Table 116) and not by the 
secretion signal sequence. The phoA signal sequence can 
direct secretion of mature BPTI into the periplasm of coli 
(MARK86) . Furthermore, there is controversy over the 
25 mechanism by which mature authentic gene VIII protein comes to 
be in the lipid bilayer prior to phage assembly. 

Thus we decided to replace the DNA coding on expression 
for the gene -VI I I -putative -signal -sequence by each of: 1) DNA 
coding on expression for the phoA signal sequence, 2) DNA 

3 0 coding on expression for the bla signal sequence, or 3) DNA 

coding on expression for the M13 gene III signal. Each of 
these replacements produces a tripartite gene encoding a 
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fusion protein that comprises, in order: (a) a signal peptide 
that directs secretion into the periplasm of parts (b) and 
(c) , derived from a first gene; (b) an initial potential 
binding domain (BPTI in this case) , derived from a second gene 
5 (in this case, the second gene is an animal gene) ; and (c) a 
structural packaging signal (the mature gene VIII coat 
protein), derived from a third gene. 

The process by which the IPBD : :packaging-signal fusion 
arrives on the phage surface is illustrated in Figure 1. In 

10 Figure la, we see that authentic gene VIII protein appears (by 
whatever process) in the lipid bilayer so that both the amino 
and carboxy termini are in the cytoplasm. Signal peptidase -I 
cleaves the gene VIII protein liberating the signal peptide 
(that is absorbed by the cell) and mature gene VIII coat 

15 protein that spans the lipid bilayer. Many copies of mature 
gene VIII coat protein accumulate in the lipid bilayer 
awaiting phage assembly (Figure lc) . Some signal sequences 
are able to direct the translocation of quite large proteins 
across the lipid bilayer. If additional codons are inserted 

2 0 after the codons that encode the cleavage site of the signal 
peptidase- I of such a potent signal sequence, the encoded 
amino acids will be translocated across the lipid bilayer as 
shown in Figure lb. After cleavage by signal peptidase-I, the 
amino acids encoded by the added codons will be in the 

2 5 periplasm but anchored to the lipid bilayer by the mature gene 

VIII coat protein, Figure Id. The circular single-stranded 
phage DNA is extruded through a part of the lipid bilayer 
containing a high concentration of mature gene VIII coat 
protein; the carboxy terminus of each coat protein molecule 

3 0 packs near the DNA while the amino terminus packs on the 

outside. Because the fusion protein is identical to mature 
gene VIII coat protein within the trans -bilayer domain, the 
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fusion protein will co-assemble with authentic mature gene 
VIII coat protein as shown in Figure le. 

In each case, the mature VIII coat protein moiety is 
intended to co-assemble with authentic mature VIII coat 
5 protein to produce phage particle having BPTI domains 

displayed on the surface. The source and character of the 
secretion signal sequence is not important because the signal 
sequence is cut away and degraded. The structural packaging 
signal, however, is quite important because it must co- 
10 assemble with the authentic coat protein to make a working 
virus sheath. 

a) Bacterial Alkaline Phosphatase ( phoA ) Signal Peptide. 

Construct pGEM-MB2 6 contains a fragment of DNA bounded by 
restriction enzyme sites SacI and AccIII which contains the 

15 new RBS and sequences encoding the initiating methionine and 
the signal peptide of M13 gene VIII pro-protein. This 
fragment was replaced with a synthetic duplex (constructed 
from four annealed oligonucleotides) containing the RBS and 
DNA coding for the initiating methionine and signal peptide of 

2 0 PhoA (INOU82) . The resulting construct was designated pGEM- 
MB42; the sequence of the fusion gene is shown in Table 113. 
M13MB48 is a derivative of GemMB42 . A BamH I- Sal l DNA fragment 
from GenMB42, containing the gene construct, was ligated into 
a similarly cleaved vector M13MB1/2 giving rise to M13MB48. 

25 PhoA RBS (SEQ ID NO: 143) and signal peptide sequence (SEQ ID 
NO: 144) 

5 ■ - GAGCTCCATGGGAGAAAATAAA . ATG . AAA . CAA . AGC . ACG . - 



| SacI | 



met lys gin ser thr 



30 



. ATC . GCA . CTC . TTA . CCG . TTA . CTG . TTT . ACC . CCT . GTG . ACA . - 
ile ala leu leu pro leu leu phe thr pro val thr 



. AAA . GCC . CGT . CCG . GAT . -3 ' 



35 



lys ala arg pro asp 
[AccIII | 
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b) beta- lactamase signal peptide. 

To enable the introduction of the beta-lactamase ( amp ) 
promoter and DNA coding for the signal peptide into the gene 
5 encoding (mature-BPTI ) : : (mature-VIII- coat-protein) an initial 
manipulation of the amp gene (encoding beta-lactamase) was 
required. Starting with pGEM-3Zf an AccIII recognition site 
(TCCGGA) was introduced into the amp gene adjacent to the DNA 
sequence encoding the amino acids at the beta-lactamase signal 

10 peptide cleavage site. Using standard methods of in vitro 

site-directed oligonucleotide mutagenesis bases C 2 so4 and A 2 soi 
were converted to T and G respectively to generate the 
construct designated pGEM-MB4 0 . Further manipulation of pGEM- 
MB4 0 entailed the insertion of a synthetic oligonucleotide 

15 linker (CGGATCCG) containing the BamHI recognition sequence 
(GGATCC) into the Aatll site (GACGTC starting at nucleotide 
number 2260) to generate the construct designated pGEM-MB45. 
The DNA bounded by the restriction enzyme sites of Bam HI and 
Acc III contains the amp promoter, amp RBS, initiating 

20 methionine and beta-lactamase signal peptide. This fragment 
was used to replace the corresponding fragment from pGEM-MB2 6 
to generate construct pGEM-MB4 6 . 

amp gene promoter (SEQ ID NO: 14 5) and signal peptide sequences 
25 (SEQ ID NO:146) 

5 ' - GGATCCGGTGGCACTTTTCGGGGAAATGTGCGCGGAACCCCTATTTGTT - 

TATTTTTCTAAATACATTCAAATATGTATCCGCTCATGAGACAATAACC- 

30 

CTGATAAATGCTTCAATAATATTGAAAAAGGAAGAGT - 

ATG . AGT . ATT . CAA . CAT . TTC . CGT . GTC . GCC . CTT . ATT . - 

met ser ile gin his phe arg val ala leu ile 

35 

CCC . TTT . TTT . GCG . GCA . TTT . TGC . CTT . CCT . GTT . TTT . - 
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pro phe phe ala ala phe cys leu pro val phe 

GCT.CAT.CCG. -3 ' 
ala his pro .... 

5 

c) M13-gene-III-signal : :bpti: :mature-VIII-coat-protein 

We may also construct, as depicted in Figure 5, M13-MB51 
which would carry a gene encoding a fusion of M13-gene-III- 
signal -peptide to the previously described BPTI : :mature VIII 

10 coat protein. First the BstEII site that follows the stop 

codons of the synthetic gene VIII is changed to an AlwNI site 
as follows. DNA of pGEM-MB26 is cut with Bst EII and the ends 
filled in by use of Klenow enzyme; a blunt Alw NI linker is 
ligated to this DNA. This construction is called pGEM- 

15 MB2 6Alw. The Xho l to Alw NI fragment (approximately 3 00 bp) of 
pGEM-MB2 6Alw is purified. RF DNA from phage MK-BPTI ( vide 
infra ) is cut with AlwN I and Xho l and the large fragment 
purified. These two fragments are ligated together; the 
resulting construction is named M13-MB51. Because M13-MB51 

20 contains no gene III , the phage can not form plaques. M13- 
MB51 can, however, render cells Km R . Infectious phage 
particles can be obtained by use of helper phage. As 
explained below, the gene III signal sequence is capable of 
directing (BPTI) : : (mature-gene- III -protein) to the surface of 

2 5 phage. In M13-MB51, we have inserted DNA encoding gene VIII 

coat protein (50 amino acids) and three stop codons 5' to the 

DNA encoding the mature gene III protein. 

Summary of signal peptide fusion protein variants. 

Signal Fusion 

Promoter RBS sequence protein 

pGEM-MB26 tac new VIII BPTI/VIII- 

coat 

pGEM-MB42 tac new phoA BPTI/VIII- 

coat 

pGEM-MB4 6 amp amp amp BPTI/VIII- 

coat 
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pGEM-MB51 III III III BPTI/VIII - 

coat 

Ml 3 MB4 8 tac new phoA BPTI/VIII - 

coat 

2 . Analysis of the Protein Products Encoded by the Synthetic 
( signal -peptide : : mature-bpti : : viii-coat -protein) Genes 

i) In vitro analysis 

5 A coupled transcription/translation prokaryotic system 

(Amersham Corp., Arlington Heights, IL) was utilized for the 
in vitro analysis of the protein products encoded by the 
BPTI/VIII synthetic gene and the variants derived from this. 
Table 107 lists the protein products encoded by the 

10 listed vectors which are visualized by the standard method of 
fluorography following in vitro synthesis in the presence of 
35 S -methionine and separation of the products using SDS 
polyacrylamide gel electrophoresis. In each sample a 
pre-beta-lactamase product (approximately 31 kd) can be seen. 

15 This is derived from the amp gene which is the common 

selection gene for each of the vectors. In addition, a 
(pre -BPTI/VIII ) product encoded by the synthetic gene and 
variants can be seen as indicated. The migration of these 
species (approximately 14.5 kd) is consistent with the 

20 expected size of the encoded proteins. 

ii) In vivo analysis. 

The vectors detailed in sections (B) and (C) were freshly 
transfected into the E^ coli strain XLl-blue (TM) (Stratagene, 
La Jolla, CA) and in strain SEF 1 . E^ coli strain SE6004 

25 (LISS85) carries the prlA4 mutation and is more permissive in 
secretion than strains that carry the wild-type prlA allele. 
SE6004 is F" and is deleted for lacl ; thus the cells can not be 
infected by M13 and lacUVS and tac promoters can not be 
regulated with IPTG. Strain SEF' is derived from strain 

30 SE6004 (LISS85) by crossing with XLl-Blue (TM) ; the F 1 in XL1- 
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Blue (TM) carries Tc R and lacl q . SE6004 is streptomycin R , Tc s 
while XLl-Blue (TM) is streptomycin 3 , Tc R so that both parental 
strains can be killed with the combination of Tc and 
streptomycin. SEF 1 retains the secretion-permissive phenotype 
5 of the parental strain, SE6004 ( prlA4 ) . 

The fresh transf ectants were grown in NZYCM medium 
(SAMB89) for 1 hour after which IPTG was added over the range 
of concentrations 1.0 /iM to 0.5 mM (to derepress the lacUVS 
and tac promoters) and grown for an additional 1.5 hours. 

10 Aliquots of the bacterial cells expressing the synthetic 

insert encoded proteins together with the appropriate controls 
(no vector, vector with no insert and zero IPTG) were lysed in 
SDS gel loading buffer and electrophoresed in 2 0% 
polyacrylamide gels containing SDS and urea. Duplicate gels 

15 were either silver stained (Daiichi, Tokyo, Japan) or 
electrotransf erred to a nylon matrix (Immobilon from 
Millipore, Bedford, MA) for western analysis by standard means 
using rabbit anti-BPTI polyclonal antibodies. 

Table 108 lists the interesting proteins visualized on a 

20 silver stained gel and by western analysis of an identical 

gel . We can see clearly in the western analysis that protein 
species containing BPTI epitopes are present in the test 
strains which are absent from the control strains and which 
are also IPTG inducible. In XLl-Blue (TM) , the migration of this 

25 species is predominantly that of the unprocessed form of the 
pro-protein although a small proportion of the encoded 
proteins appear to migrate at a size consistent with that of a 
fully processed form. In SEF 1 , the processed form 
predominates, there being only a faint band corresponding to 

3 0 the unprocessed species. 

Thus in strain SEF', we have produced a tripartite fusion 
protein that is specifically cleaved after the secretion 
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signal sequence. We believe that the mature protein comprises 
BPTI followed by the gene VIII coat protein and that the coat 
protein moiety spans the membrane. We believe that it is 
highly likely that one or more copies, perhaps hundreds of 
5 copies, of this protein will co-assemble into M13 derived 

phage or M13-like phagemids . This construction will allow us 
to a) mutagenize the BPTI domain, b) display each of the 
variants on the coat of one or more phage (one type per 
phage) , and c) recover those phage that display variants 

10 having novel binding properties with respect to target 
materials of our choice. 

Rasched and Oberer (RASC86) report that phage produced in 
cells that express two alleles of gene VIII , that have 
differences within the first 11 residues of the mature coat 

15 protein, contain some of each protein. Thus, because we have 
achieved in vivo processing of the 

phoA (signal) : : bpti : : matureVIII fusion gene, it is highly 
likely that co-expression of this gene with wild-type VIII 
will lead to production of phage bearing BPTI domains on their 
20 surface. Mutagenesis of the bpti domain of these genes will 

provide a population of phage, each phage carrying a gene that 
codes for the variant of BPTI displayed on the phage surface. 
VIII Display Phage: Production, Preparation and Analysis, 
i. Phage Production. 

2 5 The OCV can be grown in XLl-Blue (TM) in the absence of the 

inducing agent, IPTG. Typically, a plaque plug is taken from 
a plate and grown in 2 ml of medium, containing freshly 
diluted bacterial cells, for 6 to 8 hours. Following 
centrif ugation of this culture the supernatant is taken and 

3 0 the phage titer determined. This is kept as a phage stock for 

further infection, phage production and display of the gene 
product of interest . 
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A 100 fold dilution of a fresh overnight culture of SEF ' 
bacterial cells in 500 ml of NZCYM medium is allowed to grow 
to a cell density of 0,4 (Ab 600nm) in a shaker incubator at 
37 °C. To this culture is added a sufficient amount of the 
5 phage stock to give a MOI of 10 together with IPTG to give a 
final concentration of 0.5 mM. The culture is allowed to grow 
for a further 2 hrs . 

ii. Phage Preparation and Purification. 

The phage producing bacterial culture is centrifuged to 

10 separate the phage in the supernatant from the bacterial 

pellet. To the supernatant is added one quarter by volume of 
phage precipitation solution (20% PEG, 3.75 M ammonium 
acetate) and PMSF to a final concentration of lmM. It is left 
on ice for 2 hours after which the precipitated phage is 

15 retrieved by centrif ugat ion . The phage pellet is redissolved 
in TrisEDTA containing 0.1% Sarkosyl and left at 4°C for 1 
hour after which any bacteria and bacterial debris is removed 
by centrif ugat ion . The phage in the supernatant is 
reprecipitated with PEG overnight at 4°C. The phage pellet is 

2 0 resuspended in LB medium and repreciptated another two times 
to remove the detergent. The phage is stored in LB medium at 
4°C / titered and used for analysis and binding studies. 

A more stringent phage purification scheme involves 
centrif ugation in a CsCl gradient. 3.86 g of CsCl is dissolved 

2 5 in NET buffer (0.1 M NaCl , lmM EDTA, 0 . 1M Tris pH 7.7) upto a 
volume of 10 ml. 10 12 to 10 13 phage in TE Sarkosyl buffer a re 
mixed with 5 ml of CsCl NET buffer and transferred to a 
sealable ultracentrif uge tube. Centrif ugation is performed 
overnight at 34K rpm in a Sorvall OTD-65B Ultracentrif uge . 

30 The tubes are opened and 400 /xl aliqouts are carefully 

removed. 5 /xl aliqouts are removed from the fractions and 
analysed by agarose gel electrophoresis after heating at 65 °C 
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for 15 minutes together with the gel loading buffer containing 
0.1% SDS . Fractions containing phage are pooled, the phage 
reprecipitated and finally redissolved in LB medium to a 
concentration of 10 12 to 10 13 phage per ml. 
5 iii. Phage Analysis. 

The display phage, together with appropriate controls are 
analyzed using standard methods of polyacrylamide gel 
electrophoresis and either silver staining of the gel or 
electrotransf er to a nylon matrix followed by analysis with 

10 anti-BPTI antiserum (Western analysis) . Quantitation of the 
display of heterologous proteins is achieved by running a 
serial dilution of the starting protein, for example BPTI, 
together with the display phage samples in the electrophoresis 
and Western analyses described above. An alternative method 

15 involves running a 2 fold serial dilution of a phage in which 
both the major coat protein and the fusion protein are 
visualized by silver staining. A comparison of the relative 
ratios of the two protein species allows one to estimate the 
number of fusion proteins per phage since the number of VIII 

2 0 gene encoded proteins per phage (approximately 3 0 00) is known. 
Incorporation of fusion protein into bacteriophage. 

In vivo expression of the processed BPTI: VI I I fusion 
protein, encoded by vectors GemMB42 (above and Table 113) and 
M13MB48 (above) , implied that the processed fusion product was 

25 likely to be correctly located within the bacterial cell 

membrane. This localization made it possible that it could be 
incorporated into the phage and that the BPTI moiety would be 
displayed at the bacteriophage surface. 

SEF ' cells were infected with either M13MB48 (consisting 

30 of the starting phage vector M13mpl8, altered as described 
above, containing the synthetic gene consisting of a tac 
promoter, functional ribosome binding site, phoA signal 



218 

peptide, mature BPTI and mature major coat protein) or 
M13mpl8, as a control. Phage infections, preparation and 
purification was performed as described in Example VIII. 

The resulting phage were electrophoresed (approximately 
5 10 11 phage per lane) in a 2 0% polyacrylamide gel containing 

urea followed by electrotransf er to a nylon matrix and western 
analysis using anti-BPTI rabbit serum. A single species of 
protein was observed in phage derived from infection with the 
M13MB4 8 stock phage which was not observed in the control 

10 infection. This protein had a migration of about 12 kd, 

consistent with that of the fully processed fusion protein. 

Western analysis of SEF 1 bacterial lysate with or without 
phage infection demonstrated another species of protein of 
about 2 0kd. This species was also present, to a lesser 

15 degree, in phage preparations which were simply PEG 

precipitated without further purification (for example, using 
nonionic detergent or by CsCl gradient centrif ugation) . A 
comparison of M13MB48 phage progoff 

eparations made in the presence or absence of detergent 
2 0 aldemonst rated that sarkosyl treatment and CsCl gradient 
purification did remove the bacterial contaminant while having 
no effect on the presence of the BPTI:VIII fusion protein. 
This indicates that the fusion protein has been incorporated 
and is a constituent of the phage body. 

2 5 The time course of phage production and BPTI: VI I I 

incorporation was followed post-infection and after IPTG 
induction. Phage production and fusion protein incorporation 
appeared to be maximal after two hours. This time course was 
utilized in further phage productions and analyses. 

3 0 Polyacrylamide electrophoresis of the phage preparations, 

followed by silver staining, demonstrated that the 
preparations were essentially free of contaminating protein 
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species and that an extra protein band was present in M13MB48 
derived phage which was not present in the control phage. The 
size of the new protein was consistent with that seen by- 
western analysis. A similar analysis of a serially diluted 
5 BPTI:VIII incorporated phage demonstrated that the ratio of 
fusion protein to major coat protein was typically in the 
range of 1:150. Since the phage is known to contain in the 
order of 3000 copies of the gene VIII product, this means that 
the phage population contains, on average, 10' s of copies of 

10 the fusion protein per phage. 

Altering the initiating methionine of the natural gene VIII. 

The OCV M13MB4 8 contains the synthetic gene encoding the 
BPTIiVIII fusion protein in the intergenic region of the 
modified M13mpl8 phage vector. The remainder of the vector 

15 consists of the M13 genome which contains the genes necessary 
for various bacteriophage functions, such as DNA replication 
and phage formation etc. In an attempt to increase the phage 
incorporation of the fusion protein, we decided to try to 
diminish the production of the natural gene VIII product, the 

20 major coat protein, by altering the codon for the initiating 
methionine of this gene to one encoding leucine. In such 
cases, methionine is actually incorporated, but the rate of 
initiation is reduced. The change was achieved by standard 
methods of site-specific oligonucleotide mutagenesis as 

25 follows. 



M K K S -rest of VIII 
ACT . TCC . TC . ATG . AAA . AAG . TCT . (SEQ ID NOs : 96 and 97) 
30 rest of XI - T S S stop 

(The amino acid sequence MKKS has SEQ ID NO: 9) 



Site-specific mutagenesis . 

35 
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(L) K K S -rest of VIII 
ACT . TCC . AG . CTG . AAA . AAG . TCT . (SEQ ID NOs : 98 and 99) 
rest of XI - T S S stop 

5 (The amino acid sequence LKKS has SEQ ID NO: 2 60) 

Note that the 3 1 end of the XI gene overlaps with the 5 ' 
end of the VIII gene. Changes in DNA sequence were designed 
such that the desired change in the VIII gene product could be 

10 achieved without alterations to the predicted amino acid 
sequence of the gene XI product. A diagnostic PvuII 
recognition site was introduced at this site. 

It was anticipated that initiation of the natural gene 
VIII product would be hindered, enabling a higher proportion 

15 of the fusion protein to be incorporated into the resulting 
phage . 

Analyses of the phage derived from this modified vector 
indicated that there was a significant increase in the ratio 
of fusion protein to major coat protein. Quantitative 
20 estimates indicated that within a phage population as much as 
100 copies of the BPTI:VIII fusion were incorporated per 
phage . 

Incorporation of interdomain extension fusion proteins into 
phage . 

2 5 A phage pool containing a variegated pentapeptide 

extension at the BPTI : coat protein interface (see Example VII) 
was used to infect SEF ' cells. IPTG induction, phage 
production and preparation were as described in Example VIII. 
Using the criteria detailed in the previous section, it was 

3 0 determined that extended fusion proteins were incorporated 

into phage. Gel electrophoresis of the generated phage , 
followed by either silver staining or western analysis with 
anti- BPTI rabbit serum, demonstrated fusion proteins that 
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migrated similarly to but discernably slower that of the 
starting fusion protein. 

With regard to the 1 EGGGS linker 1 (SEQ ID NO: 10) 
extensions of the domain interface, individual phage stocks 
5 predicted to contain one or more 5-amino-acid unit extensions 
were analyzed in a similar fashion. The migration of the 
extended fusion proteins were readily distinguishable from the 
parent fusion protein when viewed by western analysis or 
silver staining. Those clones analyzed in more detail 

10 included M13.3X4 (which contains a single inverted EGGGS (SEQ 
ID NO: 10) linker with a predicted amino acid sequence of GSSSL 
(SEQ ID NO:16)), M13 . 3X7 (which contains a correctly 
orientated linker with a predicted amino acid sequence of 
EGGGS (SEQ ID NO:10)), M13.3X11 (which contains 3 linkers with 

15 an inversion and a predicted amino acid sequence for the 

extension of EGGGSGSSSLGSSSL (SEQ ID NO:ll)) and M13.3Xd which 
contains an extension consisting of at least 5 linkers or 25 
amino acids. 

The extended fusion proteins were all incorporated into 
20 phage at high levels (on average 10 ! s of copies per phage were 
present and when analyzed by gel electrophoresis migrated 
rates consistent with the predicted size of the extension. 
Clones M13.3X4 and M13.3X7 migrated at a position very similar 
to but discernably different from the parent fusion protein, 
25 while M13.3X11 and M13.3Xd were markedly larger. 

Display of BPTI:VIII fusion protein by bacteriophage. 

The BPTI:VIII fusion protein had been shown to be 
incorporated into the body of the phage. This phage was 
analyzed further to demonstrate that the BPTI moiety was 
30 accessible to specific antibodies and hence displayed at the 
phage surface . 
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The assay is detailed in Example II, but principally 
involves the addition of purified anti-BPTI IgG (from the 
serum of BPTI injected rabbits) to a known titer of phage. 
Following incubation, protein A-agarose beads are added to 
5 bind the IgG and left to incubate overnight. The IgG-protein 
A beads and any bound phage are removed by centrif ugation 
followed by a retitering of the supernatant to determine any 
loss of phage. The phage bound to the beads can be acid 
eluted and titered also. Appropriate controls are included in 

10 the assay, such as a wild type phage stock (M13mpl8) and IgG 
purified from normal rabbit pre -immune serum. 

Table 140 shows that while the titer of the wild type 
phage is unaltered by the presence of anti-BPTI IgG, BPTI- 
IIIMK (the positive control for the assay) , demonstrated a 

15 significant drop in titer with or without the extra addition 
of protein A beads. (Note that since the BPTI moiety is part 
of the III gene product which is involved in the binding of 
phage to bacterial pili, such a phenomenon is entirely 
expected.) Two batches of M13MB48 phage (containing the 

20 BPTI:VIII fusion protein) demonstrated a significant reduction 
in titer, as judged by plaque forming units, when anti-BPTI 
antibodies and protein A beads were added to the phage. The 
initial drop in titer with the antibody alone, differs 
somewhat between the two batches of phage. This may be a 

25 result of experimental or batch variation. Retrieval of the 
immunoprecipitated phage, while not quantitative, was 
significant when compared to the wild type phage control. 

Further control experiments relating to this section are 
shown in Table 141 and Table 142. The data demonstrated that 

30 the loss in titer observed for the BPTIrVIII containing phage 
is a result of the display of BPTI epitopes by these phage and 
the specific interaction with anti-BPTI antibodies. No 
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significant interaction with either protein A agarose beads or 
IgG purified from normal rabbit serum could be demonstrated. 
The larger drop in titer for M13MB48 batch five reflects the 
higher level incorporation of the fusion protein in this 
5 preparat ion . 

Functionality of the BPTI moiety in the BPTI-VIII display 
phage - 

The previous two sections demonstrated that the BPTIrVIII 
fusion protein has been incorporated into the phage body and 

10 that the BPTI moiety is displayed at the phage surface. To 

demonstrate that the displayed molecule is functional, binding 
experiments were performed in a manner almost identical to 
that described in the previous section except that proteases 
were used in place of antibodies. The display phage, together 

15 with appropriate controls, are allowed to interact with 

immobilized proteases or immobilized inactivated proteases. 
Binding can be assessed by monitoring the loss in titer of the 
display phage or by determining the number of phage bound to 
the respective beads. 

2 0 Table 143 shows the results of an experiment in which 

BPTI. VIII display phage, M13MB48, were allowed to bind to 
anhydrotryps in- agarose beads. There was a significant drop in 
titer when compared to wild type phage, which do not display 
BPTI. A pool of phage (BAA Pool), each contain a variegated 5 
25 amino acid extension at the BPTIrmajor coat protein interface, 
demonstrated a similar decline in titer. In a control 
experiment (table 143) very little non-specific binding of the 
above display phage was observed with agarose beads to which 
an unrelated protein (streptavidin) is attached. 

3 0 Actual binding of the display phage is demonstrated by 

the data shown for two experiments in Table 144 . The negative 
control is wild type M13mpl8 and the positive control is BPTI- 
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IIIMK, a phage in which the BPTI moiety, attached to the gene 
III protein, has been shown to be displayed and functional. 
M13MB4 8 and M13MB56 both bind to anhydrotrypsin beads in a 
manner comparable to that of the positive control, being 4 0 to 
5 60 times better than the negative control (non-display phage) . 
Hence functionality of the BPTI moiety, in the major coat 
fusion protein, was established. 

To take this analysis one step further, a comparison of 
phage binding to active and inactivated trypsin is shown in 

10 Table 145. The control phage, M13mpl8 and BPTI -I II MK, 

demonstrated binding similar to that detailed in Example III. 
Note that the relative binding is enhanced with trypsin due to 
the apparent marked reduction in the non-specific binding of 
the wild type phage to the active protease. M13.3X7 and 

15 M13.3X11, which both contain ' EGGGS 1 linker (SEQ ID NO: 10) 

extensions at the domain interface, bound to anhydrotrypsin 
and trypsin in a manner similar to BPTI -IIIMK phage. The 
binding, relative to non-display phage, was approximately 100 
fold higher in the anhydrotrypsin binding assay and at least 

20 1000 fold higher in the trypsin binding assay. The binding of 
another 1 EGGGS 1 ( S ii ^ ^ 1 inker variant (M13.3Xd) was 

similar to that of M13.3X7. 

To demonstrate the specificity of binding the assays were 
repeated with human neutrophil elastase (HNE) beads and 

25 compared to that seen with trypsin beads Table 146. BPTI has 
a very high affinity for trypsin and a low affinity for HNE, 
hence the BPTI display phage should reflect these affinities 
when used in binding assays with these beads. The negative 
and positive controls for trypsin binding were as already 

3 0 described above while an additional positive control for the 

HNE beads, BPTI (K15L, MGNG) - I I I MA ( M G N G ^ | ha s ^gg ^jj^ ^^^^ (see 
Example III) was included. The results, shown in Table 14 6, 
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confirmed this prediction. M13MB48, M13.3X7 and M13.3X11 
phage demonstrated good binding to trypsin, relative to wild 
type phage and the HNE control (BPTI (K15L , MGNG) -III MA) (The 
amino acid sequence MGNG has SEQ ID NO: 12; BPTI (. . . . 
5 . , MGNG) denotes a homologue of BPTI having M 3 9/ G 4 o/ N 4i/ G 4 2, 
where .... may indicate other alterations.), being 
comparable to BPTI- IIIMK phage. Conversely poor binding 
occurred when HNE beads were used, with the exception of the 
HNE positive control phage. 

10 Taken together the accumulated data demonstrated that 

when BPTI is part of a fusion protein with the major coat 
protein of M13 phage, the molecule is both displayed at the 
surface of the phage and a significant proportion of it is 
functional in a specific protease binding manner. 

15 *** 

EXAMPLE II 

CONSTRUCTION OF BPTI /GENE- III DISPLAY VECTOR 

DNA manipulations were conducted according to standard 
procedures as described in Maniatis et al . (MANI82) . First 

2 0 the unwanted lacZ gene of M13-MB1/2 was removed. M13-MB1/2 RF 
was cut with BamHI and Sai l and the large fragment was 
isolated by agarose gel electrophoresis. The recovered 6819 
bp fragment was filled in with Klenow fragment of E^ coli DNA 
polymerase and ligated to a synthetic Hindi I I 8mer linker 

25 (CAAGCTTG) . The ligation sample was used to transfect 

competent XLl-Blue (TM) (Stratagene, La Jolla, CA) cells which 
were subsequently plated for plaque formation. RF DNA was 
prepared from chosen plaques and a clone, M13-MB1/2 -delta, 
containing regenerated BamH I and Sai l sites as well as a new 

30 Hindlll site, all 500 bp upstream of the Bglll site (6935) was 
picked. 
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A unique Narl site was introduced into codons 17 and 18 
of gene III (changing the amino acids from H-S to G-A, Cf . 
Table 110) . 10 6 phage produced from bacterial cells harboring 
the M13-MB1/2 -delta RF DNA were used to infect a culture of 
5 CJ236 cells (relevant genotype: F 1 , dutl , ungl, Cm R ) 

(OD595=0 . 35) . Following overnight incubation at 37°C / phage 
were recovered and uracil-containing ss DNA was extracted from 
phage in accord with the instructions for the MUTA-GENE (R) M13 
in vitro Mutagenesis Kit (Catalogue Number 170-3571, Bio-Rad, 
10 Richmond, CA) . Two hundred nanograms of the purified single 
stranded DNA was annealed to 3 picomoles of a phosphorylated 
25mer mutagenic oligonucleotide, 

5 ' -gtttcagcggCgCCagaatagaaag-3 ' , (SEQ ID NO:147 

where upper case indicates the changes) . Following filling in 
15 with T4 DNA polymerase and ligation with T4 DNA ligase, the 
reaction sample was used to transfect competent XLl-Blue (TM) 
cells which were subsequently plated to permit the formation 
of plaques. 

RF DNA, isolated from phage-inf ected cells which had been 
2 0 allowed to propagate in liquid culture for 8 hours, was 

denatured, spotted on a Nytran membrane, baked and hybridized 
to the 25mer mutagenic oligonucleotide which had previously 
been phosphorylated with 32 P-ATP. Clones exhibiting strong 
hybridization signals at 70°C (6°C less than the theoretical 
2 5 Tm of the mutagenic oligonucleotide) were chosen for large 

scale RF preparation. The presence of a unique Nar l site at 
nucleotide 163 0 was confirmed by restriction enzyme analysis. 
The resultant RF DNA, M13-MB1/2- delta-Narl was cut with 
BamHI, dephosphorylated with calf intestinal phosphatase, and 
30 ligated to a 1 . 3 Kb Bam HI fragment, encoding the kanamycin- 

resistance gene ( kan ) , derived from plasmid pUC4K (Pharmacia, 
Piscataway, NJ) . The ligation sample was used to transfect 
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competent XLl-Blue (TM) cells which were subsequently plated onto 
LB plates containing kanamycin (Km) . RF DNA prepared from Km R 
colonies was prepared and subjected to restriction enzyme 
analysis to confirm the insertion of kan into M13 -MBl/2 -delta- 
5 Narl DNA thereby creating the phage MK. Phage MK grows as 
well as wild-type M13, indicating that the changes at the 
cleavage site of gene III protein are not detectably 
deleterious to the phage. 
INSERTION OF SYNTHETIC BPTI GENE 

10 The construction of the BPTI-III expression vector is 

shown in Figure 6. The synthetic bpti - VIII fusion contains a 
Nar l site that comprises the last two codons of the BPTI- 
encoding region. A second Nar l site was introduced upstream 
of the BPTI -encoding region as follows. RF DNA of phage M13- 

15 MB26 was cut with AccIII and ligated to the dsDNA adaptor: 

5 ' - TATTCTGGCGCCCGT -3 1 (SEQ ID NO: 14 8) 

3 ' -ATAAGACCGCGGGCAGGCC- 5 ' ( SEQ ID NO : 14 9) 
| Narl 1 [AccIII 

The ligation sample was subsequently restricted with Nar l and 
a 180 bp DNA fragment encoding BPTI was isolated by agarose 
gel electrophoresis. RF DNA of phage MK was digested with 
Nar l , dephosphorylated with calf intestinal phosphatase and 
ligated to the 180 bp fragment. Ligation samples were used to 
transfect competent XLl-Blue (TM> cells which were plated to 
enable the formation of plaques. DNA, isolated from phage 
derived from plaques, was denatured, applied to a Nytran 
membrane, baked and hybridized to a 32 P-phosphorylated double 
stranded DNA probe corresponding to the BPTI gene . Large 
scale RF preparations were made for clones exhibiting a strong 
hybridization signal. Restriction enzyme digestion analysis 
confirmed the insertion of a single copy of the synthetic BPTI 



25 
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gene into gene III of MK to generate phage MK-BPTI. 
Subsequent DNA sequencing confirmed that the sequence of the 
bpti-III fusion gene is correct and that the correct reading 
frame is maintained (Table 111) . Table 116 shows the entire 
5 coding region, the translation into protein sequence, and the 
functional parts of the polypeptide chain. 
EXPRESSION OF THE BPTI-III FUSION GENE IN VITRO 

MK-BPTI RF DNA was added to a coupled prokaryotic 
transcription-translation extract (Amersham) . Newly 

10 synthesized radiolabelled proteins were produced and 

subsequently separated by electrophoresis on a 15% SDS- 
polyacrylamide gel subjected to f luorography . The MK-BPTI DNA 
directs the synthesis of an unprocessed gene III fusion 
protein which is 7 Kd larger than the gene III product encoded 

15 by MK. This is consistent with the insertion of 58 amino 

acids of BPTI into the gene III protein. Immunoprecipitation 
of radiolabelled proteins generated by the cell -free 
prokaryotic extract was conducted. Neither rabbit anti(M13- 
gene-VTII -protein) IgG nor normal rabbit IgG were able to 

2 0 immunoprecipitate the gene III protein encoded by either MK or 
MK-BPTI. However, rabbit anti-BPTI IgG is able to 
immunoprecipitate the gene III protein encoded by MK-BPTI but 
not by MK. This confirms that the increase in size of the III 
protein encoded by MK-BPTI is attributable to the insertion of 

25 the BPTI protein. 
WESTERN ANALYSIS 

Phage were recovered from bacterial cultures by PEG 
precipitation. To remove residual bacterial cells, recovered 
phage were resuspended in a high salt buffer and subjected to 

30 centrif ugation, in accord with the instructions for the MUTA- 
GENE (R) M13 in vitro Mutagenesis Kit (Catalogue Number 170- 
3571, Bio-Rad, Richmond, CA) . Aliquots of phage (containing 
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up to 40 fig of protein) were subjected to electrophoresis on a 
12.5% SDS-urea-polyacrylamide gel and proteins were 
transferred to a sheet of Immobilon by electro- transfer . 
Western blots were developed using rabbit anti-BPTI serum, 
5 which had previously been incubated with an coli extract, 
followed by goat ant-rabbit antibody conjugated to alkaline 
phosphatase. An immunoreactive protein of 67 Kd is detected 
in preparations of the MK-BPTI but not the MK phage. The size 
of the immunoreactive protein is consistent with the predicted 
10 size of a processed BPTI-III fusion protein (6.4 Kd plus 60 
Kd) . These data indicate that BPTI -specif ic epitopes are 
presented on the surface of the MK-BPTI phage but not the MK 
phage . 

NEUTRALIZATION OF PHAGE TITER WITH AGAROSE- IMMOBILIZED 
15 ANHYDRO- TRYPSIN 

Anhydro- trypsin is a derivative of trypsin in which the 
active site serine has been converted to dehydroalanine . 
Anhydro- trypsin retains the specific binding of trypsin but 
not the protease activity. Unlike polyclonalantibodies , 

20 anhydro -trypsin is not expected to bind unfolded BPTI or 
incomplete fragments . 

Phage MK-BPTI and MK were diluted to a concentration 
1.4 -10 12 particles per ml. in TBS buffer (PARM88) containing 
1.0 mg/ml BSA. Thirty microliters of diluted phage were added 

25 to 2, 5, or 10 microliters of a 50% slurry of agarose- 

immobilized anhydro- trypsin (Pierce Chemical Co., Rockford, 
IL) in TBS/BSA buffer. Following incubation at 25°C, aliquots 
were removed, diluted in ice cold LB broth and titered for 
plaque -forming units on a lawn of XLl-Blue (TM) cells. Table 114 

3 0 illustrates that incubation of the MK-BPTI phage with 

immobilized anhydro- trypsin results in a very significant loss 
in titer over a four hour period while no such effect is 
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observed with the MK (control) phage. The reduction in phage 
titer is also proportional to the amount of immobilized 
anhydro- trypsin added to the MK-BPTI phage. Incubation with 
five microliters of a 50% slurry of agarose-immobilized 
5 streptavidin (Sigma, St. Louis, MO) in TBS/BSA buffer does not 
reduce the titer of either the MK-BPTI or MK phage. These 
data are consistent with the presentation of a correctly- 
folded, functional BPTI protein on the surface of the MK-BPTI 
phage but not on the MK phage. Unfolded or incomplete BPTI 
10 domains are not expected to bind anhydro- trypsin . 

Furthermore, unfolded BPTI domains are expected to be non- 
specif ically sticky. 

NEUTRALIZATION OF PHAGE TITER WITH ANTI -BPTI ANTIBODY 

MK-BPTI and MK phage were diluted to a concentration of 

15 4-10 8 plaque -forming units per ml in LB broth. Fifteen 
microliters of diluted phage were added to an equivalent 
volume of either rabbit ant i -BPTI serum or normal rabbit serum 
(both diluted 10 fold in LB broth) . Following incubation at 
37°C, aliquots were removed, diluted by 10 4 in ice-cold LB 

20 broth and titered for plaque -forming units on a lawn of XL1- 

Blue (TM) cells. Incubation of the MK-BPTI phage with ant i -BPTI 
serum results in a steady loss in titer over a two hour period 
while no such effect is observed with the MK phage. As 
expected, normal rabbit serum does not reduce the titer of 

25 either the MK-BPTI or the MK phage. Prior incubation of the 
anti-BPTI serum with authentic BPTI protein but not with an 
equivalent amount of coli protein, blocks the ability of 
the serum to reduce the titer of the MK-BPTI phage. This data 
is consistent with the presentation of BPTI -specif ic epitopes 

30 on the surface of the MK-BPTI phage but not the MK phage. 

More specifically, the data indicates that these BPTI epitopes 
are associated with the gene III protein and that association 
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of this fusion protein with an anti-BPTI antibody blocks its 
ability to mediate the infection of bacterial cells. 
NEUTRALIZATION OF PHAGE TITER WITH TRYPSIN 

MK-BPTI and MK phage were diluted to a concentration of 
5 4-10 8 plaque -forming units per ml in LB broth. Diluted phage 
were added to an equivalent volume of trypsin diluted to 
various concentrations in LB broth. Following incubation at 
37 °C, aliquots were removed, diluted by 10 4 in ice cold LB 
broth and titered for plaque -forming units on a lawn of XL1- 

10 Blue (TM ) cells. Incubation of the MK-BPTI phage with 0.15 ^g 
of trypsin results in a 70% loss in titer after a two hour 
period while only a 15% loss in titer is observed for the MK 
phage. A reduction in the amount of trypsin added to phage 
results in a reduction in the loss of titer. However, at all 

15 trypsin concentrations investigated , the MK-BPTI phage are 
more sensitive to incubation with trypsin than the MK phage. 
An interpretation of this data is that association of the 
BPTI-III fusion protein displayed on the surface of the MK- 
BPTI phage with trypsin blocks its ability to mediate the 

20 infection of bacterial cells. 

The reduction in titer of phage MK by trypsin is an 
example of a phenomenon that is likely to be general : 
proteases, if present in sufficient quantity, will degrade 
proteins on the phage and reduce infectivity. The present 

25 application lists several means that can be used to overcome 
this problem. 

AFFINITY SELECTION SYSTEM 

Affinity Selection with Immobilized Anhydro -Trypsin 

MK-BPTI and MK phage were diluted to a concentration of 
30 1.4 -10 12 particles per ml in TBS buffer (PARM88) containing 1.0 
mg/ml BSA. We added 4.0-10 10 phage to 5 microliters of a 50% 
slurry of either agarose- immobilized anhydro- trypsin beads 
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(Pierce Chemical Co.) or agarose- immobilized streptavidin 
beads (Sigma) in TBS/BSA. Following a 3 hour incubation at 
room temperature, the beads were pelleted by centrif ugation 
for 30 seconds at 5000 rpm in a microfuge and the supernatant 
5 fraction was collected. The beads were washed 5 times with 
TBS/Tween buffer (PARM88) and after each wash the beads were 
pelleted by centrif ugation and the supernatant was removed. 
Finally, beads were resuspended in elution buffer (0.1 N HC1 
containing 1.0 mg/ml BSA adjusted to pH 2.2 with glycine) and 
10 following a 5 minute incubation at room temperature, the beads 
were pelleted by centrif ugation. The supernatant was removed 
and neutralized by the addition of 1.0 M Tris-HCl buffer, pH 
8.0. 

Aliquot s of phage samples were applied to a Nytran 

15 membrane using a Schleicher and Schuell (Keene, NH) filtration 
minifold and phage DNA was immobilized onto the Nytran by 
baking at 80 °C for 2 hours. The baked filter was incubated at 
42°C for 1 hour in pre-wash solution (MANI82) and pre- 
hybridization solution (5Prime-3Prime, West Chester, PA) . The 

20 1.0 Kb Narl (base 1630) / Xmn I (base 2646) DNA fragment from MK 
RF was radioactively labelled with 32 P-dCTP using an 
oligolabelling kit (Pharmacia, Piscataway, NJ) . The 
radioactive probe was added to the Nytran filter in 
hybridization solution (5Prime-3Prime) and, following 

25 overnight incubation at 42 °C, the filter was washed and 
subjected to autoradiography. 

The efficiency of this affinity selection system can be 
semi -quantitatively determined using the dot -blot procedure 
described elsewhere in the present application. Exposure of 

30 MK-BPTI-phage-treated anhydro- trypsin beads to elution buffer 
releases bound MK-BPTI phage. Streptavidin beads do not 
retain phage MK-BPTI. Anhydro- trypsin beads do not retain 
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phage MK. In the experiment depicted in Table 115, we 
estimate that 2 0% of the total MK-BPTI phage were bound to 5 
microliters of the immobilized anhydro- trypsin and were 
subsequently recovered by washing the beads with elution 
5 buffer (pH 2.2 HCl/glycine) . Under the same conditions, no 

detectable MK-BPTI phage were bound and subsequently recovered 
from the streptavidin beads. The amount of MK-BPTI phage 
recovered in the elution fraction is proportional to the 
amount of immobilized anhydro -trypsin added to the phage. No 

10 detectable MK phage were bound to either the immobilized 
anhydro -trypsin or streptavidin beads and no phage were 
recovered with elution buffer. These data indicate that the 
affinity selection system described above can be utilized to 
select for phage displaying a specific folded protein (in this 

15 case, BPTI) . Unfolded or incomplete BPTI domains are not 
expected to bind anhydro- trypsin . 
Affinity Selection with Anti-BPTI antibodies 

MK-BPTI and MK phage were diluted to a concentration of 
1-10 10 particles per ml in Tris buffered saline solution 

20 (PARM88) containing 1.0 mg/ml BSA. Two-10 8 phage were added to 
2.5 ix<3 of either biotinylated rabbit anti-BPTI IgG in TBS/BSA 
or biotinylated rabbit anti -mouse antibody IgG (Sigma) in 
TBS/BSA, and incubated overnight at 4°C. A 50% slurry of 
streptavidin-agarose (Sigma) , washed three times with TBS 

25 buffer prior to incubation with 30 mg/ml BSA in TBS buffer for 
60 minutes at room temperature, was washed three times with 
TBS/Tween buffer (PARM88) and resuspended to a final 
concentration of 50% in this buffer. Samples containing phage 
and biotinylated IgG were diluted with TBS/Tween prior to the 

30 addition of streptavidin-agarose in TBS/Tween buffer. 
Following a 60 minute incubation at room temperature, 
streptavidin-agarose beads were pelleted by centrif ugation for 
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30 seconds and the supernatant fraction was collected. The 
beads were washed 5 times with TBS/Tween buffer and after each 
wash, the beads were pelleted by centrif ugation and the 
supernatant was removed. Finally, the streptavidin-agarose 
5 beads were resuspended in elution buffer (0.1 N HCl containing 
1.0 mg/ml BSA adjusted to pH 2.2 with glycine), incubated 5 
minute at room temperature, and pelleted by centrif ugation . 
The supernatant was removed and neutralized by the addition of 
1.0 M Tris-HCl buffer, pH 8.0. 

10 Aliquots of phage samples were applied to a Nytran 

membrane using a Schleicker and Schuell minifold apparatus. 
Phage DNA was immobilized onto the Nytran by baking at 8 0 °C 
for 2 hours. Filters were washed for 60 minutes in pre-wash 
solution (MANI82) at 42 °C then incubated at 42 °C for 60 

15 minutes in Southern pre-hybri dization solution (SPrime- 

3 Prime ) . The 1 . 0 Kb Narl (1630bp) /XmnI (2646 bp) DNA fragment 
from MK RF was radioactively labelled with 32 P-o?dCTP using an 
oligolabelling kit (Pharmacia, Piscataway, NJ) . Nytran 
membranes were transferred from pre-hybridization solution to 

20 Southern hybridization solution (5Prime-3Prime) at 42 °C. The 
radioactive probe was added to the hybridization solution and 
following overnight incubation at 42 °C, the filter was washed 
3 times with 2 x SSC, 0.1% SDS at room temperature and once at 
65 °C in 2 x SSC, 0.1% SDS. Nytran membranes were subjected to 

25 autoradiography. The efficiency of the affinity selection 

system can be semi -quantitatively determined using the above 
dot blot procedure. Comparison of dots Al and Bl or CI and Dl 
indicates that the majority of phage did not stick to the 
streptavidin-agarose beads. Washing with TBS/Tween buffer 

3 0 removes the majority of phage which are non-specif ically 
associated with streptavidin beads. Exposure of the 
streptavidin beads to elution buffer releases bound phage only 
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in the case of MK-BPTI phage which have previously been 
incubated with biotinylated rabbit anti-BPTI IgG. This data 
indicates that the affinity selection system described above 
can be utilized to select for phage displaying a specific 
5 antigen (in this case BPTI) . We estimate an enrichment factor 
of at least 40 fold based on the calculation 

Percent MK-BPTI phage recovered 

Enrichment Factor = 

10 Percent MK phage recovered 

EXAMPLE III 

CHARACTERIZATION AND FRACTIONATION OF CLONALLY PURE 
POPULATIONS OF PHAGE, EACH DISPLAYING A SINGLE CHIMERIC 
15 APROTININ HOMOLOGUE/M13 GENE III PROTEIN: 

This Example demonstrates that chimeric phage proteins 
displaying a target -binding domain can be eluted from 
immobilized target by decreasing pH, and the pH at which the 
protein is eluted is dependent on the binding affinity of the 

20 domain for the target. 
Standard Procedures : 

Unless otherwise noted, all manipulations were carried 
out at room temperature. Unless otherwise noted, all cells 
are XLl-Blue (TM) (Stratagene, La Jolla, CA) . 

2 5 1) Demonstration of the Binding of BPTI -III MK Phage to Active 
Trypsin Beads 

Previous experiments designed to verify that BPTI 
displayed by fusion phage is functional relied on the use of 
immobilized anhydro- trypsin, a catalyt ically inactive form of 

30 trypsin. Although anhydro- trypsin is essentially identical to 
trypsin structurally (HUBE75, YOK077) and in binding 
properties (VINC74, AKOH72) , we demonstrated that BPTI-II1 
fusion phage also bind immobilized active trypsin. 



236 

Demonstration of the binding of fusion phage to immobilized 
active protease and subsequent recovery of infectious phage 
facilitates subsequent experiments where the preparation of 
inactive forms of serine proteases by protein modification is 
5 laborious or not feasible. 



Fifty Ail of BPTI-III MK phage (identified as MK-BPTI is 
Example II) (3.7-10 11 pfu/ml) in either 50 mM Tris, pH 7.5, 150 
mM NaCl, 1,0 mg/ml BSA (TBS/BSA) buffer or 50 mM sodium 

10 citrate, pH 6.5, 150 mM NaCl, 1.0 mg/ml BSA (CBS/BSA) buffer 
were added to 10 /il of a 25% slurry of immobilized trypsin 
(Pierce Chemical Co., Rockford, IL) also in TBS/BSA or 
CBS/BSA. As a control, 50 /zl MK phage (9.3 -10 12 pfu/ml) were 
added to 10 fil of a 25% slurry of immobilized trypsin in 

15 either TBS/BSA or CBS/BSA buffer. The infectivity of BPTI-III 
MK phage is 25 -fold lower than that of MK phage; thus the 
conditions chosen above ensure that an approximately 
equivalent number of phage particles are added to the trypsin 
beads . After 3 hours of mixing on a Labquake shaker 

20 (Labindustries Inc., Berkeley, CA) 0.5 ml of either TBS/BSA or 
CBS/BSA was added where appropriate to the samples . Beads 
were washed for 5 min and recovered by centrif ugation for 3 0 
sec. The supernatant was removed and 0.5 ml of TBS/0.1% 
Tween-2 0 was added. The beads were mixed for 5 minutes on the 

25 shaker and recovered by centrif ugation as above. The 
supernatant was removed and the beads were washed an 
additional five times with TBS/0.1% Tween-20 as described 
above. Finally, the beads were resuspended in 0.5 ml of 
elution buffer (0.1 M HC1 containing 1.0 mg/ml BSA adjusted to 

30 pH 2.2 with glycine), mixed for 5 minutes and recovered by 
centrif ugation. The supernatant fraction was removed and 
neutralized by the addition of 130 /xl of 1 M Tris, pH 8.0. 
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Aliquots of the neutralized elution sample were diluted in LB 
broth and titered for plaque -forming units on a lawn of cells. 

Table 201 illustrates that a significant percentage of 
the input BPTI-III MK phage bound to immobilized trypsin and 
5 was recovered by washing with elution buffer. The amount of 
fusion phage which bound to the beads was greater in TBS 
buffer (pH 7.5) than in CBS buffer (pH 6.5). This is 
consistent with the observation that the affinity of BPTI for 
trypsin is greater at pH 7.5 than at pH 6.5 (VINC72 , VINC74) . 

10 A much lower percentage of the MK control phage (which do not 
display BPTI) bound to immobilized trypsin and this binding 
was independent of the pH conditions. At pH 6.5, 1675 times 
more of the BPTI-III MK phage than of the MK phage bound to 
trypsin beads while at pH 7.5, a 2103-fold difference was 

15 observed. Hence fusion phage displaying BPTI adhere not only 
to anhydro-trypsin beads but also to active trypsin beads and 
can be recovered as infectious phage. These data, in 
conjunction with earlier findings, strongly suggest that BPTI 
displayed on the surface of fusion phage is appropriately 

20 folded and functional. 

2) Generation of PI Mutants of BPTI 

To demonstrate the specificity of interaction of BPTI-III 
fusion phage with immobilized serine proteases, single amino 
acid substitutions were introduced at the PI position (residue 

25 15 of mature BPTI) of the BPTI-III fusion protein by site- 
directed mutagenesis. A 25mer mutagenic oligonucleotide (PI) 
was designed to substitute a LEU codon for the LYSi 5 codon. 
This alteration is desired because BPTI (K15L) is a moderately 
good inhibitor of human neutrophil elastase (HNE) (Ka = 2.9- 10" 9 

3 0 M) (BECK88b) and a poor inhibitor of trypsin. A fusion phage 
displaying BPTI (K15L) should bind to immobilized HNE but not 
to immobilized trypsin. BPTI-III MK fusion phage would be 
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expected to display the opposite phenotype (bind to trypsin, 
fail to bind to HNE) . These observations would illustrate the 
binding specificity of BPTI-1II fusion phage for immobilized 
serine proteases. 
5 Mutagenesis of the PI region of the BPTI-VIII gene 

contained within the intergenic region of recombinant phage 
MB4 6 was carried out using the Muta-Gene M13 In Vitro 
Mutagenesis Kit (Bio-Rad # Richmond, CA) . MB46 phage (7.5-10 6 
pfu) were used to infect a 50 ml culture of CJ236 cells 

10 (O.D.600 = 0.5). Following overnight incubation at 37°C, 

phage were recovered and uracil- containing single- stranded 
DNA was extracted from the phage. The single-stranded DNA was 
further purified by NACS chromatography as recommended by the 
manufacturer (B.R.L., Gaithersburg, MD) . 

15 Two hundred nanograms of the purified single- stranded DNA 

were annealed to 3 picomoles of the phosphorylated 2 5mer 
mutagenic oligonucleotide (PI) . Following filling in with T4 
DNA polymerase and ligation with T4 DNA ligase, the sample was 
used to transfect competent cells which were subsequently 

20 plated on LB plates to permit the formation of plaques. Phage 
derived from picked plaques were applied to a Nytran membrane 
using a Schleicher and Schuell (Keene, NH) minifold I 
apparatus (Dot Blot Procedure) . Phage DNA was immobilized 
onto the filter by baking at 80 °C for 2 hours. The filter was 

25 bathed in 1 X Southern pre-hybridization buffer (5Prime- 
3 Prime, West Chester, PA) for 2 hours. Subsequently, the 
filter was incubated in 1 X Southern hybridization solution 
(5Prime-3Prime) containing a 21mer probing oligonucleotide 
(LEU1) which had been radioactively labelled with gamma- 32 P-ATP 

3 0 (N.E.N. /DuPont, Boston, MA) by T4 polynucleotide kinase (New 
England BioLabs (NEB) , Beverly, MA) . Following overnight 
hybridization, the filter was washed 3 times with 6 X SSC at 
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room temperature and once at 60 °C in 6 X SSC prior to 
autoradiography. Clones exhibiting strong hybridization 
signals were chosen for large scale Rf preparation using the 
PZ523 spin column protocol (5Prime-3Prime) . Restriction 
5 enzyme analysis confirmed that the structure of the Rf was 

correct and DNA sequencing confirmed the substitution of a LEU 
codon (TTG) for the LYSi 5 codon (AAA) . This Rf DNA was 
designated MB46 (K15L) . 

3) Generation of the BPTI-III MA Vector 

10 The original gene III fusion phage MK can be detected on 

the basis of its ability to transduce cells to kanamycin 
resistance (Km R ) . It was deemed advantageous to generate a 
second gene III fusion vector which can confer resistance to a 
different antibiotic, namely ampicillin (Ap) . One could then 

15 mix a fusion phage conferring Ap R while displaying engineered 
protease inhibitor A (EPI-A) with a second fusion phage 
conferring Km R while displaying EPI-B. The mixture could be 
added to an immobilized serine protease and, following elution 
of bound fusion phage, one could evaluate the relative 

20 affinity of the two EPIs for the immobilized protease from the 
relative abundance of phage that transduce cells to Km R or Ap R . 

The ap R gene is contained in the vector pGem3Zf (Promega 
Corp., Madison, WI) which can be packaged as single stranded 
DNA contained in bacteriophage when helper phage are added to 

25 bacteria containing this vector. The recognition sites for 

restriction enzymes Smal and SnaBI were engineered into the 3 1 
non-coding region of the Ap R (K-lactamase) gene using the 
technique of synthetic oligonucleotide directed site specific 
mutagenesis. The single stranded DNA was used as the template 

30 for in vitro mutagenesis leading to the following DNA sequence 
alterations (numbering as supplied by Promega) : a) to create a 
Sma l (or Xmal) site, bases Tin 5 -->C and Ani 6 -->C, and b) to 
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create a SnaBI site, Gn 25 -->T, Cn 2 9-->T, and Tu 30 -->A. The 
alterations were confirmed by radiolabelled probe analysis 
with the mutating oligonucleotide and restriction enzyme 
analysis; this plasmid is named pSGK3 . 
5 Plasmid SGK3 was cut with Aatll and Sma l and treated with 

T4 DNA polymerase (NEB) to remove overhanging 3 ! ends (MANI82, 
SAMB8 9) . Phosphorylated Hindi I I linkers (NEB) were ligated to 
the blunt ends of the DNA and following Hindlll digestion, the 
1.1 kb fragment was isolated by agarose gel electrophoresis 

10 followed by purification on an Ultraf ree-MC filter unit as 
recommended by the manufacturer (Millipore, Bedford, MA) . 
M13-MB1/2 -delta Rf DNA was cut with Hin di I I and the linearized 
Rf was purified and ligated to the 1.1 kb fragment derived 
from pSGK3 . Ligation samples were used to transfect competent 

15 cells which were plated on LB plates containing Ap. Colonies 
were picked and grown in LB broth containing Ap overnight at 
37 °C. Aliquots of the culture supernatants were assayed for 
the presence of infectious phage. Rf DNA was prepared from 
cultures which were both Ap R and contained infectious phage. 

20 Restriction enzyme analysis confirmed that the Rf contained a 

single copy of the Ap R gene inserted into the intergenic region 
of the M13 genome in the same transcriptional orientation as 
the phage genes. This Rf DNA was designated MA. 

The 5.9 kb Bglll/BsmI fragment from MA Rf DNA and the 2.2 

25 kb Bglll/BsmI fragment from BPTI-III MK Rf DNA were ligated 
together and a portion of the ligation mixture was used to 
transfect competent cells which were subsequently plated to 
permit plaque formation on a lawn of cells. Large and small 
size plaques were observed on the plates. Small size plaques 

3 0 were picked for further analysis since BPTI-III fusion phage 
give rise to small plaques due to impairment of gene III 
protein function. Small plaques were added to LB broth 
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containing Ap and cultures were incubated overnight at 3 7°C. 
An Ap R culture which contained phage which gave rise to small 
plaques when plated on a lawn of cells was used as a source of 
Rf DNA. Restriction enzyme analysis confirmed that the BPTI- 
5 III fusion gene had been inserted into the MA vector. This Rf 
was designated BPTI-III MA. 
4) Construction of BPTI (K15L) -III MA 

MB46 (K15L) Rf DNA was digested with Xhol and EagI and the 
125 bp DNA fragment was isolated by electrophoresis on a 2% 

10 agarose gel followed by extraction from an agarose slice by 
centrif ugation through an Ultrafree-MC filter unit. The 8.0 
kb Xho l / Eag I fragment derived from BPTI-III MA Rf was also 
prepared. The above two fragments were ligated and the 
ligation sample was used to transfect competent cells which 

15 were plated on LB plates containing Ap. Colonies were picked 
and used to inoculate LB broth containing Ap . Cultures were 
incubated overnight at 37 °C and phage within the culture 
supernatants was probed using the Dot Blot Procedure. Filters 
were hybridized to a radioactively labelled oligonucleotide 

20 (LEU1) . Positive clones were identified by autoradiography 
after washing filters under high stringency conditions. Rf 
DNA was prepared from Ap R cultures which contained phage 
carrying the K15L mutation. Restriction enzyme analysis and 
DNA sequencing confirmed that the K15L mutation had been 

25 introduced into the BPTI-III MA Rf . This Rf was designated 
BPTI (K15L) -III MA. Interestingly, BPTI (K15L) -III MA phage 
gave rise to extremely small plaques on a lawn of cells and 
the infectivity of the phage is 4 to 5 fold less than that of 
BPTI-III MK phage. This suggests that the substitution of LEU 

30 for LYSis impairs the ability of the BPTI:gene III fusion 
protein to mediate phage infection of bacterial cells. 



242 



5) Preparation of Immobilized Human Neutrophil Elastase 
One ml of Reacti-Gel 6 x CDI activated agarose (Pierce 

Chemical Co.) in acetone (200 /il packed beads) was introduced 
into an empty Select-D spin column (5Prime-3Prime) . The 
5 acetone was drained out and the beads were washed twice 

rapidly with 1.0 ml of ice cold water and 1,0 ml of ice cold 
10 0 mM boric acid # pH 8.5, 0.9% NaCl . Two hundred /il of 2.0 
mg/ml human neutrophil elastase (HNE) (CalBiochem, San Diego, 
CA) in borate buffer were added to the beads. The column was 

10 sealed and mixed end over end on a Labquake Shaker at 4°C for 
36 hours. The HNE solution was drained off and the beads were 
washed with ice cold 2.0 M Tris, pH 8.0 over a 2 hour period 
at 4°C to block remaining reactive groups. A 50% slurry of 
the beads in TBS/BSA was prepared. To this was added an equal 

15 volume of sterile 100% glycerol and the beads were stored as a 
25% slurry at -20 °C. Prior to use, the beads were washed 3 
times with TBS/BSA and a 50% slurry in TBS/BSA was prepared. 

6) Characterization of the Affinity of BPT1-III MK and 
BPTI (K15L) -III MA Phage for Immobilized Trypsin and Human 

2 0 Neutrophil Elastase 

Thirty /il of BPTI -I II MK phage in TBS/BSA (1.7-10 11 
pfu/ml) was added to 5 /il of a 50% slurry of either 
immobilized human neutrophil elastase or immobilized trypsin 
(Pierce Chemical Co.) also in TBS/BSA. Similarly 30 /il of 

25 BPTI (K15L) -III MA phage in TBS/BSA (3.2 -10 10 pfu/ml) was added 
to either immobilized HNE or trypsin. Samples were mixed on a 
Labquake shaker for 3 hours . The beads were washed with 0 . 5 
ml of TBS/BSA for 5 minutes and recovered by centrif ugat ion . 
The supernatant was removed and the beads were washed 5 times 

30 with 0.5 ml of TBS/0.1% Tween-20. Finally, the beads were 

resuspended in 0 . 5 ml of elution buffer (0.1 M HCl containing 
1.0 mg/ml BSA adjusted to pH 2.2 with glycine), mixed for 5 
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minutes and recovered by centrif ugation . The supernatant 
fraction was removed, neutralized with 130 /il of 1 M Tris, pH 
8.0, diluted in LB broth, and titered for plaque -forming units 
on a lawn of cells. 
5 Table 202 illustrates that 82 times more of the BPTI-III 

MK input phage bound to the trypsin beads than to the HNE 
beads. By contrast, the BPTI (K15L) -III MA phage bound 
preferentially to HNE beads by a factor of 36. These results 
are consistent with the known affinities of wild type and the 

10 K15L variant of BPTI for trypsin and HNE. Hence BPTI-III 

fusion phage bind selectively to immobilized proteases and the 
nature of the BPTI variant displayed on the surface of the 
fusion phage dictates which particular protease is the optimum 
receptor for the fusion phage. 

15 7) Effect of pH on the Dissociation of Bound BPTI-III MK and 
BPTI (K15L) -III MA Phage from Immobilized Neutrophil Elastase 
The affinity of a given fusion phage for an immobilized 
serine protease can be characterized on the basis of the 
amount of bound fusion phage which elutes from the beads by 

20 washing with a pH 2.2 buffer. This represents rather extreme 
conditions for the dissociation of fusion phage from beads. 
Since the affinity of the BPTI variants described above for 
HNE is not high (Kd > 1-10" 9 M) it was anticipated that fusion 
phage displaying these variants might dissociate from HNE 

25 beads under less severe pH conditions. Furthermore fusion 
phage might dissociate from HNE beads under specific pH 
conditions characteristic of the particular BPTI variant 
displayed by the phage. Low pH buffers providing stringent 
wash conditions might be required to dissociate fusion phage 

3 0 displaying a BPTI variant with a high affinity for HNE whereas 
neutral pH conditions might be sufficient to dislodge a fusion 
phage displaying a BPTI variant with a weak affinity for HNE. 
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Thirty Ml of BPTI (K15L) -III MA phage (1.7-10 10 pfu/ml in 
TBS/BSA) were added to 5 /xl of a 50% slurry of immobilized HNE 
also in TBS/BSA. Similarly, 30 fil of BPTI-III MA phage 
(8.6-10 10 pfu/ml in TBS/BSA) were added to 5 /il of immobilized 
5 HNE. The above conditions were chosen to ensure that an 

approximately equivalent number of phage particles were added 
to the beads. The samples were incubated for 3 hours on a 
Labquake shaker. The beads were washed with 0.5 ml of TBS/BSA 
for 5 min on the shaker, recovered by centrif ugat ion and the 

10 supernatant was removed. The beads were washed with 0.5 ml of 
TBS/0.1% Tween-20 for 5 minutes and recovered by 
centrif ugat ion. Four additional washes with TBS/0.1% Tween-20 
were performed as described above. The beads were washed as 
above with 0.5 ml of 100 mM sodium citrate, pH 7.0 containing 

15 1.0 mg/ml BSA. The beads were recovered by centrif ugation and 
the supernatant was removed. Subsequently, the HNE beads were 
washed sequentially with a series of 100 mM sodium citrate, 
1.0 mg/ml BSA buffers of pH 6.0, 5.0, 4.0 and 3.0 and finally 
with the 2.2 elution buffer described above. The pH washes 

2 0 were neutralized by the addition of 1 M Tris, pH 8.0, diluted 
in LB broth and titered for plaque -forming units on a lawn of 
cells . 

Table 203 illustrates that a low percentage of the input 
BPTI-III MK fusion phage adhered to the HNE beads and was 

25 recovered in the pH 7.0 and 6.0 washes predominantly. By 

contrast, a significantly higher percentage of the BPTI (K15L) - 
III MA phage bound to the HNE beads and was recovered 
predominantly in the pH 5.0 and 4.0 washes. Hence lower pH 
conditions ( i.e. more stringent) are required to dissociate 

30 BPTI (K15L) -III MA than BPTI -MK phage from immobilized HNE. 
The affinity of BPTI (K15L) is over 1000 times greater than 
that of BPTI for HNE (based on reported Ka values (BECK8 8b) ) . 
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Hence this suggests that lower pH conditions are indeed 
required to dissociate fusion phage displaying a BPTI variant 
with a higher affinity for HNE. 

8) Construction of BPTI (MGNG) -III MA Phage ( I^ g^^ ir ^^ rt ^^^ D 
5 NO: 12) 

The light chain of bovine inter-a- trypsin inhibitor 
contains 2 domains highly homologous to BPTI . The amino 
terminal proximal domain (called BI-8e) has been generated by 
proteolysis and shown to be a potent inhibitor of HNE (Kd = 

10 4.4-10" 11 M) (ALBR83) . By contrast a BPTI variant with the 
single substitution of LEU for LYSi 5 exhibits a moderate 
affinity for HNE (Kd = 2.9-10" 9 M) (BECK88b) - It has been 
proposed that the PI residue is the primary determinant of the 
specificity and potency of BPTI -like molecules (BECK88b, 

15 LASK80 and works cited therein) . Although both BI-8e and 

BPTI (K15L) feature LEU at their respective PI positions, there 
is a 66 fold difference in the affinities of these molecules 
for HNE. Structural features, other than the PI residue, must 
contribute to the affinity of BPTI -like molecules for HNE. 

20 A comparison of the structures of BI-8e and BPTI (K15L) 

reveals the presence of three positively charged residues at 
positions 39, 41, and 42 of BPTI which are absent in BI-8e. 
These hydrophilic and highly charged residues of BPTI are 
displayed on a loop which underlies the loop containing the PI 

25 residue and is connected to it via a disulfide bridge. 

Residues within the underlying loop (in particular residue 39) 
participate in the interaction of BPTI with the surface of 
trypsin near the catalytic pocket (BL0W72) and may contribute 
significantly to the tenacious binding of BPTI to trypsin. 

30 However, these hydrophilic residues might hamper the docking 

of BPTI variants with HNE. In support of this hypothesis, BI- 
8e displays a high affinity for HNE and contains no charged 
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residues in the region spanning residues 39-42. Hence 
residues 3 9 through 42 of wild type BPTI were replaced with 
the corresponding residues of the human homologue of BI-8e. 
We anticipated that a BPTI derivative containing the MET-GLY- 
5 ASN-GLY (MGNG) sequence (SEQ ID NO: 12) would exhibit a higher 
affinity for HNE than corresponding derivatives which retain 
the sequence of wild type BPTI at residues 39-42. 

A double stranded oligonucleotide with Acc I and EagI 
compatible ends was designed to introduce the desired 

10 alteration of residues 39 to 42 via cassette mutagenesis. 

Codon 4 5 was altered to create a new XmnI site, unique in the 
structure of the BPTI gene, which could be used to screen for 
mutants. This alteration at codon 45 does not alter the 
encoded amino-acid sequence. BPTI-III MA Rf DNA was digested 

15 with Acc I . Two oligonucleotides (CYSB and CYST) corresponding 
to the bottom and top strands of the mutagenic DNA were 
annealed and ligated to the Acc I digested BPTI-III MA Rf DNA. 
The sample was digested with Bgl ll and the 2.1 kb Bgl ll/ Eag I 
fragment was purified. BPTI-III MA Rf was also digested with 

20 Bgl ll and Eag I and the 6.0 kb fragment was isolated and 

ligated to the 2.1 kb Bgl ll/ Eag I fragment described above. 
Ligation samples were used to transfect competent cells which 
were plated to permit the formation of plaques on a lawn of 
cells. Phage derived from plaques were probed with a 

25 radioactively labelled oligonucleotide (CYSB) using the Dot 
Blot Procedure. Positive clones were identified by 
autoradiography of the Nytran membrane after washing at high 
stringency conditions. Rf DNA was prepared from Ap R cultures 
containing fusion phage which hybridized to the CYSB probe. 

3 0 Restriction enzyme analysis and DNA sequencing confirmed that 
codons 39-42 of BPTI had been altered. The Rf DNA was 
designated BPTI (MGNG) -III MA (The amino acid sequence MGNG has 
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SEQ ID NO:12; BPTI ( ,MGNG) - III MA (MGNG ha s SEQ 

j^^N j O^j L^^ denotes a strain of M13 that displays BPTI ( . . . . 
. , MGNG) (MGNG _has SEQ ID NO: 12) fused to the gill protein and 
that carries the bla gene that confers AP r *) . 
5 9) Construction of BPTI (K15L, MGNG) -III MA (MGNG has SEQ ID 

BPTI (MGNG) -III MA Rf DNA (MOas S EQ ID NO : 12) was 
digested with AccI and the 5.6 kb fragment was purified. 
BPTI (K15L) -III MA was digested with Acc I and the 2.5 kb DNA 

10 fragment was purified. The two fragments above were ligated 

together and ligation samples were used to transfect competent 
cells which were plated for plaque production. Large and 
small plaques were observed on the plate. Representative 
plaques of each type were picked and phage were probed with 

15 the LEU1 oligonucleotide via the Dot Blot Procedure. After 
the Nytran filter had been washed under high stringency 
conditions, positive clones were identified by 

autoradiography. Only the phage which hybridized to the LEU1 
oligonucleotide gave rise to the small plaques confirming an 

20 earlier observation that substitution of LEU for LYS 15 

substantially reduces phage infect ivity. Appropriate cultures 
containing phage which hybridized to the LEU1 oligonucleotide 
were used to prepare Rf DNA. Restriction enzyme analysis and 
DNA sequencing confirmed that the K15L mutation had been 

25 introduced into BPTI (MGNG) -III MA ^(MGft^^ 

This Rf DNA was designated BPTI (K15L, MGNG) -III MA (MGNG.ha^ 

10) Effect of Mutation of Residues 39-42 of BPTI (K15L) on its 
Affinity for Immobilized HNE 
30 Thirty jjlI of BPTI (K15L, MGNG) -III MA phage (9.2 -10 9 pfu/ml 

in TBS/BSA) ffiGNG , has S EQ^ ^D^NO^l 2^ we re added to 5 /il of a 
50% slurry of immobilized HNE also in TBS/BSA. Similarly 30 
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Ml of BPTI (K15L) -III MA phage (1.2 -10 10 pfu/ml in TBS/BSA) were 
added to immobilized HNE . The samples were incubated for 3 
hours on a Labquake shaker. The beads were washed for 5 min 
with 0.5 ml of TBS/BSA and recovered by centrif ugation . The 
5 beads were washed 5 times with 0.5 ml of TBS/0.1% Tween-20 as 
described above. Finally, the beads were washed sequentially 
with a series of 100 mM sodium citrate buffers of pH 7.0, 6.0, 
5.5, 5.0, 4.75, 4.5, 4.25, 4.0 and 3.5 as described above. pH 
washes were neutralized, diluted in LB broth and titered for 

10 plaque -forming units on a lawn of cells. 

Table 204 illustrates that almost twice as much of the 
BPTI (K15L,MGNG) -III MA j^gj-^^ as BPTI (K15L) - 

III MA phage bound to HNE beads. In both cases the pH 4.75 
fraction contained the largest proportion of the recovered 

15 phage. This confirms that replacement of residues 39-42 of 
wild type BPTI with the corresponding residues of BI-8e 
enhances the binding of the BPTI (K15L) variant to HNE. 
11) Fractionation of a Mixture of BPTI -I II MK and 
BPTI (K15Li,MGNG) - III MA Fusion Phage (^ j^ ^fr^^ 

2 0 The observations described above indicate that 

BPTI (K15L,MGNG) -III MA ^ MGNG^has,^ SEQ ^ ID ^ N O ^2)^ and BPTI-III MK 
phage exhibit different pH elution profiles from immobilized 
HNE. It seemed plausible that this property could be 
exploited to fractionate a mixture of different fusion phage. 

25 Fifteen fxl of BPTI-III MK phage (3.92-10 10 pfu/ml in 

TBS/BSA), equivalent to 8.91-10 7 Km R transducing units, were 
added to 15 ill of BPTI (K15L,MGNG) -III MA (MGNG has SEP ID 



N Q^LScL. phage (9.85-10 9 pfu/ml in TBS/BSA), equivalent to 

4.44-10 7 Ap R transducing units. Five /xl of a 50% slurry of 
3 0 immobilized HNE in TBS/BSA was added to the phage and the 
sample was incubated for 3 hours on a Labquake mixer. The 
beads were washed for 5 minutes with 0.5 ml of TBS/BSA prior 
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to being washed 5 times with 0.5 ml of TBS/2.0% Tween-2 0 as 
described above. Beads were washed for 5 minutes with 0.5 ml 
of 100 mM sodium citrate, pH 7.0 containing 1.0 mg/ml BSA. 
The beads were recovered by centrif ugation and the supernatant 
5 was removed. Subsequently, the HNE beads were washed 

sequentially with a series of 100 mM citrate buffers of pH 
6.0, 5.0 and 4.0. The pH washes were neutralized by the 
addition of 130 fil of 1 M Tris, pH 8.0. 



pH fraction was evaluated by determining the number of phage 
able to transduce cells to Km R as opposed to Ap R . Fusion phage 
diluted in 1 X Minimal A salts were added to 100 ill of cells 
(O.D.600 = 0.8 concentrated to 1/20 original culture volume) 

15 also in Minimal salts in a final volume of 200 ^1 . The sample 
was incubated for 15 min at 37 °C prior to the addition of 200 
111 of 2 X LB broth. After an additional 15 min incubation at 
37 °C, duplicate aliquots of cells were plated on LB plates 
containing either Ap or Km to permit the formation of 

20 colonies. Bacterial colonies on each type of plate were 

counted and the data was used to calculate the number of Ap R 
and Km R transducing units in each pH fraction. The number of 
Ap R transducing units is indicative of the amount of 
BPTI (K15L,MGNG) -III MA (MGNG has SEP ID NO: 12) phage in each 



25 pH fraction while the total number of Km R transducing units is 
indicative of the amount of BPTI -III MK phage. 

Table 205 illustrates that a low percentage of the BPTI- 
III MK input phage (as judged by Km R transducing units) adhered 
to the HNE beads and was recovered predominantly in the pH 7.0 

30 fraction. By contrast, a significantly higher percentage of 
the BPTI (K15L, MGNG) -III MA ( M GN G ^ h^ f| ^ phage (as 

judged by Ap R transducing units) adhered to the HNE beads and 



The relative proportion of BPTI-III MK and 
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in each 
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was recovered predominantly in the pH 4.0 fraction. A 
comparison of the total number of Ap R and Km R transducing units 
in the pH 4.0 fraction shows that a 984-fold enrichment of 
BPTI (K15L,MGNG) -III MA (MGNG has SEQ ID NO:12) phage over 
5 BPTI -II I MK phage was achieved. Hence, the above procedure 

can be utilized to fractionate mixtures of fusion phage on the 
basis of their relative affinities for immobilized HNE. 
12) Construction of BPTI (K15V, R17L) -III MA 

A BPTI variant containing the alterations K15V and R17L 

10 demonstrates the highest affinity for HNE of any BPTI variant 
described to date (Ka = 6-10" 11 M) (AUER89) . As a means of 
testing the selection system described herein, a fusion phage 
displaying this variant of BPTI was generated and used as a 
"reference" phage to characterize the affinity for immobilized 

15 HNE of fusion phage displaying a BPTI variant with a known 
affinity for free HNE. A 76 bp mutagenic oligonucleotide 
(VAL1) was designed to convert the LYSi 5 codon (AAA) to a VAL 
codon (GTT) and the ARG 17 codon (CGA) to a LEU codon (CTG) . At 
the same time codons 11, 12 and 13 were altered to destroy the 

20 Apa l site resident in the wild type BPTI gene while creating a 
new RsrII site, which could be used to screen for correct 
clones . 

The single stranded VAL1 oligonucleotide was converted to 
the double stranded form following the procedure described in 

25 Current Protocols in Molecular Biology (AUSU87) . One /zg of 
the VAL1 oligonucleotide was annealed to one ^tg of a 2 0 bp 
primer ( MB 8 ) . The sample was heated to 80°C, cooled to 62°C 
and incubated at this temperature for 3 0 minutes before being 
allowed to cool to 3 7 °C. Two /il of a 2 . 5 mM mixture of dNTPs 

30 and 10 units of Sequenase (U.S.B., Cleveland, Ohio) were added 
to the sample and second strand synthesis was allowed to 
proceed for 4 5 minutes at 3 7°C. One hundred units of Xhol was 
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added to the sample and digestion was allowed to proceed for 2 
hours at 37 °C in 100 /xl of 1 X Xhol digestion buffer. The 
digested DNA was subjected to electrophoreses on a 4% GTG 
NuSieve agarose (FMC Bioproducts, Rockland, ME) gel and the 65 
5 bp fragment was excised and purified from melted agarose by 

phenol extraction and ethanol precipitation. A portion of the 
recovered 65 bp fragment was subjected to electrophoresis on a 
4% GTG NuSieve agarose gel for quantitation. One hundred 
nanograms of the recovered fragment was dephosphorylated with 

10 1 . 9 /xl of HK (TM) phosphatase (Epicentre Technologies, Madison, 
WI) at 37 °C for 6 0 minutes. The reaction was stopped by 
heating at 65 °C for 15 minutes. BPTI-MA Rf DNA was digested 
with Xho l and StuI and the 8.0 kb fragment was isolated. One 
111 of the dephosphorylation reaction (5 ng of double -stranded 

15 VAL1 oligonucleotide) was ligated to 50 ng of the 8.0 kb 
Xho l/ Stu I fragment derived from BPTI-III MA Rf . Ligation 
samples were subjected to phenol extraction and DNA was 
recovered by ethanol precipitation. Portions of the recovered 
ligation DNA were added to 40 /xl of electro-competent cells 

2 0 which were shocked using a Bio-Rad Gene Pulser device set at 
1.7 kv, 25 /xF and 800 Q . One ml of SOC media was immediately 
added to the cells which were allowed to recover at 37 °C for 
one hour. Aliquots of the electroporated cells were plated 
onto LB plates containing Ap to permit the formation of 

25 colonies. 

Phage contained within cultures derived from picked Ap R 
colonies were probed with two radiolabelled oligonucleotides 
(PRP1 and ESP1) via the Dot Blot Procedure. Rf DNA was 
prepared from cultures containing phage which exhibited a 
30 strong hybridization signal with the ESP1 oligonucleotide but 
not with the PRP1 oligonucleotide. Restriction enzyme 
analysis verified loss of the Apa l site and acquisition of a 
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new Rsr II site diagnostic for the changes in the PI region. 
Fusion phage were also probed with a radiolabelled 
oligonucleotide (VLP1) via the Dot Blot Procedure. 
Autoradiography confirmed that fusion phage which previously 
5 failed to hybridize to the PRP1 probe, hybridized to the VLP1 
probe. DNA sequencing confirmed that the LYS15 and ARG17 codons 
had been converted to VAL and LEU codons respectively. The Rf 
DNA was designated BPTI (K15V, R17L) -III MA. 

13) Affinity of BPTI (K15V, R17L) -III MA Phage for Immobilized 
10 HNE 

Forty ill of BPTI (K15 , R17L) -III MA phage (9.8-10 10 pfu/ml) 
in TBS/BSA were added to 10 fil of a 50% slurry of immobilized 
HNE also in TBS/BSA. Similarly, 40 til of BPTI (K15L, MGNG) - I II 
MA (MGNG has SEQ ID. NO: 12) phage (5.13 -10 9 pfu/ml) in TBS/BSA 

15 were added to immobilized HNE. The samples were mixed for 1.5 
hours on a Labquake shaker. Beads were washed once for 5 min 
with 0.5 ml of TBS/BSA and then 5 times with 0.5 ml of 
TBS/1.0% Tween-20 as described previously. Subsequently the 
beads were washed sequentially with a series of 50 mM sodium 

2 0 citrate buffers containing 150 mM NaCl , 1.0 mg/ml BSA of pH 
7.0, 6.0, 5.0, 4.5, 4.0, 3.75, 3.5 and 3.0. In the case of 
the BPTI (K15L, MGNG) -III MA (MGNG has SEQ ID. NO: 12) phage, the 
pH 3.75 and 3.0 washes were omitted. Two washes were 
performed at each pH and the supernatant s were pooled, 

2 5 neutralized with 1 M Tris pH 8.0, diluted in LB broth and 
titered for plaque -forming units on a lawn of cells. 

Table 206 illustrates that the pH 4.5 and 4.0 fractions 
contained the largest proportion of the reco vered 
BPTI (K15V,R17L) -III MA phage. By contrast, the 

30 BPTI (K15L, MGNG) -III MA (MGNGhas SE Q ID NO: 12) phage, like 

BPTI (K15L) - III MA phage, were recovered predominantly in the 
pH 5.0 and 4.5 fractions, as shown above. The affinity of 
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BPTI (K15V,R17L) is 48 times greater than that of BPTI (K15L) 
for HNE (based on reported Ka values, AUER89 for 
BPTI (K15V,R17L) and BECK88b for BPTI (K15L) ) . That the pH 
elution profile for BPTI (K15V, R17L) -III MA phage exhibits a 
5 peak at pH 4.0 while the profile for BPTI (K15L) -III MA phage 
displays a peak at pH 4.5 supports the contention that lower 
pH conditions are required to dissociate, from immobilized 
HNE, fusion phage displaying a BPTI variant with a higher 
affinity for free HNE. 
10 * * * 

EXAMPLE IV 

CONSTRUCTION OF A VARIEGATED POPULATION OF PHAGE DISPLAYING 
BPTI DERIVATES AND FRACTIONATION FOR MEMBERS THAT DISPLAY 
BINDING DOMAINS HAVING HIGH AFFINITY FOR HUMAN NEUTROPHIL 
15 ELASTASE: 

We here describe generation of a library of 1000 
different potential engineered protease inhibitiors (PEPIs) 
and the fractionation with immobilized HNE to obtain an 
engineered protease inhibitor (Epi) having high affinity for 

2 0 HNE. Successful Epis that bind HNE are designated EpiNEs . 

1) Design of a Mutagenic Oligonucleotide to Create a Library 
of Fusion Phage 

A 76 bp variegated oligonucleotide (MYMUT) was designed 
to construct a library of fusion phage displaying 1000 
25 different PEPIs derived from BPTI. The oligonucleotide 
contains 1728 different DNA sequences but due to the 
degeneracy of the genetic code, it encodes 1000 different 
protein sequences. The oligonucleotide was designed so as to 
destroy an Apa l site (shown in Table 113) encompassing codons 

3 0 12 and 13. Apa l digestion could be used to select against the 

parental Rf DNA used to construct the library. 
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The MYMUT oligonucleotide permits the substitution of 5 
hydrophobic residues (PHE, LEU, ILE, VAL, and MET via a DTS 
codon (D = approximately equimolar A, T, and G; S = 
approximately equimolar C and G) ) for LYS 15 . Replacement of 
5 LYSis in BPTI with aliphatic hydrophobic residues via semi- 

synthesis has provided proteins having higher affinity for HNE 
than BPTI (TANK77, JERI74a # b, WENZ80, TSCH86 # BECK88b) . At 
position 16, either GLY or ALA are permitted (GST codon) . 
This is in keeping with the predominance of these two residues 

10 at the corresponding positions in a variety of BPTI homologues 
(CREI87) . The variegation scheme at position 17 is identical 
to that at 15. Limited data is available on the relative 
contribution of this residue to the interaction of BPTI 
homologues with HNE. A variety of hydrophobic residues at 

15 position 17 was included with the anticipation that they would 
enhance the docking of a BPTI variant with HNE. Finally at 
positions 18 and 19, 4 (PHE, SER, THR, and ILE via a WYC codon 
(W = approximately equimolar A and T; Y = approximately 
equimolar T and C) ) and 5 (SER, PRO, THR, LYS, GLN, and stop 

2 0 via an HMA codon (H = approximately equimolar A, C, and T ; M = 
approximately equimolar A and C) ) different amino acids 
respectively are encoded. These different amino acid residues 
are found in the corresponding positions of BPTI homologues 
that are known to bind to HNE (CREI87) . Although the amino 

25 acids included in the PEPI library were chosen because there 

was some indication that they might facilitate binding to HNE, 
it was not and is not possible to predict which combination of 
these amino acids will lead to high affinity for HNE. The 
mutagenic oligonucleotide MYMUT was synthesized by Genetic 

30 Design Inc. (Houston, Texas). 
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2) Construction of Library of Fusion Phage Displaying 
Potential Engineered Protease Inhibitors 

The single- stranded mutagenic MYMUT DNA was converted to 
the double stranded form with compatible Xhol and Stu I ends 
5 and dephosphorylated with HK (TM) phosphatase as described above 
for the VAL1 oligonucleotide. BPTI (MGNG) -III MA Rf DNA jMGNG 
h as ||^ l 9 1 ,.££JS£i^|^^ was digested with Xho l and Stu I for 3 hours 
at 37°C to ensure complete digestion. The 8.0 kb DNA fragment 
was purified by agarose gel electrophoresis and Ultrafree-MC 

10 unit filtration. One /xl of the dephosphorylated MYMUT DNA (5 
ng) was ligated to 50 ng of the 8.0 kb fragment derived from 
BPTI (MGNG) -III MA Rf DNA J^ G&^ Under these 

conditions, the 10:1 molar ratio of insert to vector was found 
to be optimal for the generation of transf ormants . Ligation 

15 samples were extracted with phenol, phenol /chloroform/ IAA 

(25:24:1, v:v:v) and chloroform/ IAA (24:1, v:v) and DNA was 
ethanol precipitated prior to electroporat ion . One /xl of the 
recovered ligation DNA was added to 40 /xl of electro-competent 
cells. Cells were shocked using a Bio-Rad Gene Pulser device 

20 as described above. Immediately following electroshock, 1.0 
ml of SOC media was added to the cells which were allowed to 
recover at 37 °C for 60 minutes with shaking. The 
electroporated cells were plated onto LB plates containing Ap 
to permit the formation of colonies. 

25 To assess the efficiency of the cassette mutagenesis 

procedure, 3 9 transf ormants were picked at random and phage 
present in culture supernatant s were applied to a Nytran 
membrane and probed using the Dot Blot Procedure. Two Nytran 
membranes were prepared in this manner. The first filter was 

30 allowed to hybridize to the CYSB oligonucleotide which had 
previously been radiolabelled . The second membrane was 
allowed to hybridize to the PRP1 oligonucleotide which had 
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also been radiolabelled . Filters were subjected to 
autoradiography following washing under high stringency 
conditions. Of the 3 9 phage samples applied to the membrane, 
all 39 hybridized to the CYSB probe. This indicated that 
5 there was fusion phage in the culture supernatants and that at 
least the DNA encoding residues 35-47 appeared to be present 
in the phage genomes. Only 11 of the 39 samples hybridized to 
the PRP1 oligonucleotide indicating that 2 8% of the 
transf ormants were probably the parental phage BPTI (MGNG) -III 

10 MA jMG^g, , ftas^SE ^^ used to generate the library. The 

remaining 28 clones failed to hybridize to the PRP1 probe 
indicating that substantial alterations were introduced into 
the PI region by cassette mutagenesis using the MYMUT 
oligonucleotide. Of these 28 samples, all were found to 

15 contain infectious phage indicating that mutagenesis did not 
result in frame shift mutations which would lead to the 
generation of defective gene III products and non- infectious 
phage. (These 28 PEPI -displaying phage constitute a mini- 
library, the fractionation of which is discussed below.) 

20 Hence the overall efficiency of mutagenesis was estimated to 
be 72% in those cases where ligation DNA was not subjected to 
Apa l digestion prior to electroporat ion . 

Bacterial colonies were harvested by overlaying chilled 
2 5 LB plates containing Ap with 5 ml of ice cold LB broth and 

scraping off cells using a sterile glass rod. A total of 4899 
transf ormants were harvested in this manner of which 32 99 were 
obtained by electroporation of ligation samples which were not 
digested with Apa l . Hence we estimate that 72% of these 
30 transf ormants ( i.e. 2375) represent mutants of the parental 
BPTI (MGNG) -III MA (^N^Jms^S^ phage derived by 

cassette mutagenesis of the PI position. An additional 1600 
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transf ormants were obtained by electroporation of ligation 
samples which had been digested with Apa l . If we assume that 
all of these clones contain new sequences at the PI position 
then the total number of mutants in the pool of 4899 
5 transf ormants is estimated to be 2375 + 1600 = 3975. The 
total number of potentially different DNA sequences in the 
MYMUT library is 1728. We calculate that the library should 
display about 90% of the potential engineered protease 
inhibitor sequences as follows: 
10 N displayed = N possible • ( l-exp{ -Libsize/N (DNA) } ) 

= 1000 * (1 - exp{-3975/l728}) = 900 
% of possible sequences displayed = 100 • (900 -s- 1000) 

= 90% 



15 3) Fractionation of a Mini -Library of Fusion Phage 

We studied the fractionation of the mini library of 28 
PEPIs to establish the appropriate parameters for 
fractionation of the entire MYMUT PEPI library. We 
anticipated that fractionation could be easier when the 

20 library of fusion phage was much less diverse than the entire 
MYMUT library. Fewer cycles of fractionation might be 
required to affinity purify a fusion phage exhibiting a high 
affinity for HNE . Secondly, since the sequences of all the 
fusion phage in the mini- library can be determined, one can 

25 determine the probability of selecting a given fusion phage 
from the initial population. 

Two ml of the culture supernatant s of the 2 8 PEPIs 
described above were pooled. Fusion phage were recovered, 
resuspended in 3 00 mM NaCl , 100 mM Tris, pH 8.0, 1 mM EDTA and 

30 stored on ice for 15 minutes. Insoluble material was removed 
by centrif ugation for 3 minutes in a microfuge at 4°C. The 
supernatant fraction was collected and PEPI phage were 
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precipitated with PEG-8000. The final phage pellet was 
resuspended in TBS/BSA. Aliquots of the recovered phage were 
titered for plaque- forming units on a lawn of cells. The 
final stock solution consisted of 200 /xl of fusion phage at a 
5 concentration of 5.6-10 12 pf u/ml . 
a) First Enrichment Cycle 

Forty /xl of the above phage stock was added to 10 /il of a 
50% slurry of HNE beads in TBS/BSA. The sample was allowed to 
mix on a Labquake shaker for 1.5 hours. Five hundred /xl of 

10 TBS/BSA was added to the sample and after an additional 5 
minutes of mixing, the HNE beads were collected by 
centrif ugation . The supernatant fraction was removed and the 
beads were resuspended in 0 . 5 ml of TBS/0.5% Tween-20. Beads 
were washed for 5 minutes on the shaker and recovered by 

15 centrif ugation as above. The supernatant fraction was removed 
and the beads were subjected to 4 additional washes with 
TBS/Tween-20 as described above to reduce non-specific binding 
of fusion phage to HNE beads. Beads were washed twice as 
above with 0.5 ml of 50 mM sodium citrate pH 7.0, 150 mM NaCl 

20 containing 1.0 mg/ml BSA. The supernatants from the two 

washes were pooled. Subsequently, the HNE beads were washed 
sequentially with a series of 50 mM sodium citrate, 150 mM 
NaCl, 1.0 mg/ml BSA buffers of pH 6.0, 5.0, 4.5, 4.0, 3.5, 
3.0, 2.5 and 2.0. Two washes were performed at each pH and 

25 the supernatants were pooled and neutralized by the addition 
of 260 /xl of 1 M Tris, pH 8.0. Aliquots of each pH fraction 
were diluted in LB broth and titered for plaque- forming units 
on a lawn of cells. The total amount of fusion phage (as 
judged by pfu) appearing in each pH wash fraction was 

3 0 determined. 

Figure 7 illustrates that the largest percentage of input 
phage which bound to the HNE beads was recovered in the pH 5.0 
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fraction. The elution peak exhibits a trailing edge on the 
low pH side suggesting that a small proportion of the total 
bound fusion phage might elute from the HNE beads at a pH < 5. 
BPTI (K15L) -III phage display a BPTI variant with a moderate 
5 affinity for HNE (Kd = 2.9-10" 9 M) (BECK88b) . Since 

BPTI (K15L) -III phage elute from HNE beads as a peak centered 
on pH 4.75 and the highest peak in the first passage of the 
mini-library over HNE beads is centered on pH 5.0, we infer 
that many members of the MYMUT PEPI mini -library display PEPIs 

10 having moderate to high affinity for HNE. 

To enrich for fusion phage displaying the highest 
affinity for HNE, phage contained in the lowest pH fraction 
(pH 2.0) from the first enrichment cycle were amplified and 
subjected to a second round of fractionation. Amplification 

15 involved the Transduction Procedure described above. Fusion 
phage (2000 pfu) were incubated with 100 fj.1 of cells for 15 
minutes at 37 °C in 200 ill of 1 X Minimal A salts. Two hundred 
/xl of 2 X LB broth was added to the sample and cells were 
allowed to recover for 15 minutes at 37 °C with shaking. One 

20 hundred ill portions of the above sample were plated onto LB 
plates containing Ap. Five such transduction reactions were 
performed yielding a total of 2 0 plates, each containing 
approximately 350 colonies (7000 transf ormants in total) . 
Bacterial cells were harvested as described for the 

2 5 preparation of the MYMUT library and fusion phage were 

collected as described for the preparation of the mini- 
library. A total of 200 /il of fusion phage (4.3 -10 12 pfu/ml in 
TBS/BSA) derived from the pH 2 . 0 fraction from the first 
passage of the mini -library was obtained in this manner. 

3 0 b) Second Enrichment Cycle 

Forty /xl of the above phage stock was added to 10 /il of a 
50% slurry of HNE beads in TBS/BSA. The sample was allowed to 
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mix for 1.5 hours and the HNE beads were washed with TBS/BSA, 
TBS/0.5% Tween and sodium citrate buffers as described above. 
Aliqouts of neutralized pH fractions were diluted and titered 
as described above . 
5 The elution profile for the second passage of the mini- 

library over HNE beads is shown in Figure 7. The largest 
percentage of the input phage which bound to the HNE beads was 
recovered in the pH 3.5 wash. A smaller peak centered on pH 
4.5 may represent residual fusion phage from the first passage 

10 of the mini-library which eluted at pH 5.0. The percentage of 
total input phage which eluted at pH 3.5 in the second cycle 
exceeds the percentage of input phage which eluted at pH 5 . 0 
in the first cycle. This is indicative of more avid binding 
of fusion phage to the HNE matrix. Taken together, the 

15 significant shift in the pH elution profile suggests that 
selection for fusion phage displaying BPTI variants with 
higher affinity for HNE occurred. 
c) Third Cycle 

Phage obtained in the pH 2.0 fraction from the second 

20 passage of the mini-library were amplified as above and 

subjected to a third round of fractionation. The pH elution 
profile is shown in Figure 7. The largest percentage of input 
phage was recovered in the pH 3.5 wash as is the case with the 
second passage of the mini -library. However, the minor peak 

25 centered on pH 4.5 is diminished in the third passage relative 
to the second passage. Furthermore, the percentage of input 
phage which eluted at pH 3.5 is greater in the third passage 
than in the second passage. In comparison, the 
BPTI (K15V,R17L) -III fusion phage elute from HNE beads as a 

30 peak centered on pH 4.25. Taken together, the data suggests 

that a significant selection for fusion phage displaying PEPIs 
with high affinity for HNE occurred. Furthermore, since more 
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extreme pH conditions are required to elute fusion phage in 
the third passage of the MYMUT library relative to those 
conditions needed to elute BPTI (K15V, R17L) -III MA phage, this 
suggests that those fusion phage which appear in the pH 3.5 
5 fraction may display a PEPI with a higher affinity for HNE 
than the BPTI (K15V, R17L) variant ( i.e. Ka < 6-10" 11 M) . 
d) Characterization of Selected Fusion Phage 

The pH 2.0 fraction from the third passage of the mini- 
library was titered and plaques were obtained on a lawn of 
10 cells. Twenty plaques were picked at random and phage derived 
from plaques were probed with the CYSB oligonucleotide via the 
Dot Blot Procedure. Autoradiography of the filter revealed 
that all 20 samples gave a positive hybridization signal 
indicating that fusion phage were present and the DNA encoding 



15 residues 35 to 47 of BPTI (MGNG) . mSmJaa^ ^J^NOzl^ is 




contained within the recombinant M13 genomes. Rf DNA was 
prepared for the 2 0 clones and initial dideoxy sequencing 
revealed that 12 clones were identical . This sequence was 
designated EpiNEa (SEQ ID NO 2^4^ and SEQ ID NO:108) (Table 

2 0 2 07) . No DNA sequence changes were observed apart from the 

planned variegation. Hence the cassette mutagenesis procedure 
preserved the context of the planned variegation of the pepi 
gene. The Dot Blot Procedure was employed to probe all 20 
selected clones from the pH 2.0 fraction from the third 
25 passage of the mini-library with an oligonucleotide homologous 
to the sequence of EpiNEar . Following high stringency washing, 
autoradiography revealed that all 2 0 selected clones were 
identical in the PI region. Furthermore dot blot analysis 
revealed that of the 2 8 different phage samples pooled to 

3 0 create the mini -library, only one contained the EpiNEar 

sequence. Hence in just three passes of the mini -library over 
HNE beads, 1 out of 2 8 input fusion phage was selected for and 
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appears as a pure population in the lowest pH fraction from 
the third passage of the library. That the EpiNEof phage elute 
at pH 3.5 while BPTI (K15V, R17L) -III MA phage elute at a higher 
pH strongly suggests that the EpiNEof protein has a 
5 significantly higher affinity than BPTI (K15V, R17L) for HNE. 
4) Fractionation of the MYMUT Library 
a) Three cycles of enrichment 

The same procedure used above to fractionation the mini- 
library was used to fractionate the entire MYMUT PEPI library 

10 consisting of fusion phage displaying 1000 different proteins. 
The phage inputs for the first, second and third rounds of 
fractionation were 4.0-10 11 , 5.8 -10 10 , and 1.1-10 11 pfu 
respectively. Figure 8 illustrates that the largest 
percentage of input phage which bound to the HNE matrix was 

15 recovered in the pH 5.0 wash in the first enrichment cycle. 
The pH elution profile is very similar to that seen for the 
first passage of the mini-library over HNE beads. A trailing 
edge is also observed on the low pH side of the pH 5.0 peak 
however this is not as prominent as that observed for the 

2 0 mini -library. The percentage of input phage which eluted in 
the pH 7.0 wash was greater than that eluted in the pH 6.0 
wash. This is in contrast to the result obtained for the 
first passage of the mini library and may reflect the 
presence of -20% parental BPTI (MGNG) -III MA JMGWG has S EQ ID 

25 NO: 12) phage in the MYMUT library pool. These phage adhere to 
the HNE beads weakly (if at all) and elute in the pH 7.0 
fraction. That no parent phage were present in the mini- 
library is consistent with the absence of a peak at pH 7.0 in 
the first passage of the mini-library. 

30 Phage present in the pH 2 . 0 fraction from the first 

passage of the MYMUT library were amplified as described 
previously and subjected to a second round of fractionation. 
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The largest percentage of input phage which bound to the HNE 
beads was recovered in the pH 3.5 wash (Figure 8) . A minor 
peak centered on pH 4.5 was also evident. The fact that more 
extreme pH conditions were required to elute the majority of 
5 bound fusion phage suggested that selection of fusion phage 
displaying PEPIs with higher affinity for HNE had occurred. 
This was also indicated by the fact that the total percentage 
of input phage which appeared in the pH 3 . 5 wash in the second 
enrichment cycle was 10 times greater than the percentage of 

10 input which appeared in the pH 5.0 wash in the first cycle. 

Fusion phage from the pH 2.0 fraction of the second pass 
of the MYMUT library were amplified and subjected to a third 
passage over HNE beads. The proportion of fusion phage 
appearing in the pH 3.5 fraction relative to that in the 4.5 

15 fraction was greater in the third passage than in the second 

passage (Figure 8) . Also the amount of fusion phage appearing 
in the pH 3.5 fraction was higher in the third passage than in 
the second passage. The fact that wash conditions less than 
pH 4.2 5 were required to elute bound fusion phage derived from 

2 0 the MYMUT library suggests that the EpiNEs displayed by these 
phage possess a higher affinity for HNE than the 
BPTI (K15V, R17L) variant . 

b) Characterization of Selected Clones 

The pH 2.0 fraction from the third enrichment cycle of 
2 5 the MYMUT library was titered on a lawn of cells. Twenty 

plaques were picked at random. Rf DNA was prepared for each 
of the clones and fusion phage were collected by PEG 
precipitation. Clonally pure populations of fusion phage in 
TBS/BSA were prepared and characterized with respect to their 
30 affinity for immobilized HNE. pH elution profiles were 

obtained to determine the stringency of the conditions 
required to elute bound fusion phage from the HNE matrix. 



264 

Figure 9 illustrates the pH profiles obtained for EpiNE clones 
1, (SEQ ID NO:51), 3, (SEQ ID NO:46), and 7 (SEQ ID NO:48). 
The pH profiles for all 3 clones exhibit a peak centered on pH 
3.5. Unlike the pH profile obtained for the third passage of 
5 the MYMUT library, no minor peak centered on pH 4.5 is 

evident. This is consistent with the clonal purity of the 
selected EpiNE phage utilized to generate the profiles. The 
elution peaks are not symmetrical and a prominent trailing 
edge on the low pH side. In all probability, the 10 minute 

10 elution period employed is inadequate to remove bound fusion 

phage at the low pH conditions. EpiNE clones 1 through 8 have 
the following characteristics: five clones (identified as 
EpiNEl (SEQ ID NO:51), EpiNE3 (SEQ ID NO:46), EpiNE5 (SEQ ID 
NO:52), EpiNE 6 (SEQ ID NO:47), and EpiNE7 (SEQ ID N0:48)) 

15 display very similar pH profiles centered on pH 3.5. The 

remaining 3 clones elute in the pH 3.5 to 4.0 range. There 
remains some diversity amongst the 2 0 randomly chosen clones 
obtained from the pH 2.0 fraction of the third passage of the 
MYMUT library and these clones might exhibit different 

20 affinities for HNE . 

c) Sequences of the EpiNE Clones 

The DNA sequences encoding the PI regions of the 
different EpiNE clones were determined by dideoxy sequencing 
of Rf DNA. The sequences are shown in Table 208. 

25 Essentially, only the codons targeted for mutagenesis ( i.e. 15 
to 19) were altered as a consequence of cassette mutagenesis 
using the MYMUT oligonucleotide. Only 1 codon outside the 
target region was found to contain an unexpected alteration. 
In this case, codon 21 of EpiNE8 was altered from a tyrosine 

3 0 codon (TAT) to a SER codon (TCT) by a single nucleotide 

substitution. This error could have been introduced into the 
MYMUT oligonucleotide during its synthesis. Alternatively, an 
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error could have been introduced when the single- stranded 
MYMUT oligonucleotide was converted to the double -stranded 
form by Sequenase . Regardless of the reason, the error rate 
is extremely low considering only 1 unexpected alteration was 
5 observed after sequencing 20 codons in 19 different clones. 

Furthermore, the value of such a mutation is not diminished by 
its accidental nature . 

Some of the EpiNE clones are identical. The sequences of 
EpiNEl, EpiNE3, and EpiNE7 appear a total of 4, 6 and 5 times 

10 respectively- Assuming the 1745 potentially different DNA 

sequences encoded by the MYMUT oligonucleotide were present at 
equal frequency in the fusion phage library, the frequent 
appearance of the sequences for clones EpiNEl , EpiNE3 , and 
EpiNE7 may have important implications. EpiNEl, EpiNE3 , and 

15 EpiNE7 fusion phage may display BPTI variants with the highest 
affinity for HNE of all the 1000 potentially different BPTI 
variants in the MYMUT library. 

An examination of the sequences of the EpiNE clones is 
illuminating. A strong preference for either VAL or ILE at 

20 the PI position (residue 15) is indicated with VAX. being 

favored over ILE by 14 to 6 . In the MYMUT library, VAL at 
position 15 is approximately twice as prevalent as ILE. No 
examples of LEU, PHE, or MET at the PI position were observed 
although the MYMUT oligonucleotide has the potential to encode 

25 these residues at PI. This is consistent with the observation 
that BPTI variants with single amino acid substitutions of 
LEU, PHE, or MET for LYSi 5 exhibit a significantly lower 
affinity for HNE than their counterparts containing either VAL 
or ILE (BECK88b) . 

30 PHE is strongly favored at position 17, appearing in 12 

of 20 codons. MET is the second most prominent residue at 
this position but it only appears when VAL is present at 
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position 15. At position 18 PHE was observed in all 20 clones 
sequenced even though the MYMUT oligonucleotide is capable of 
encoding other residues at this position. This result is 
quite surprising and could not be predicted from previous 
5 mutational analysis of BPTI, model building, or on any 

theoretical grounds. We infer that the presence of PHE at 
position 18 significantly enhances the ability each of the 
EpiNEs to bind to HNE. Finally at position 19, PRO appears in 
10 of 20 codons while SER, the second most prominent residue, 

10 appears at 6 of 20 codons. Of the residues targeted for 

mutagenesis in the present study, residue 19 is the nearest to 
the edge of the interaction surface of a PEPI with HNE. 
Nevertheless, a preponderance of PRO is observed and may 
indicate that PRO at 19, like PHE at 18, enhances the binding 

15 of these proteins to HNE. Interestingly, EpiNE5 appears only 
once and differs from EpiNEl only at position 19; similarly, 
EpiNE6 differs from EpiNE3 only at position 19. These 
alterations may have only a minor effect on the ability of 
these proteins to interact with HNE. This is supported by the 

2 0 fact that the pH elution profiles for EpiNE5 and EpiNE6 are 
very similar to those of EpiNEl and EpiNE3 respectively. 

Only EpiNE2 and EpiNE8 exhibit pH profiles which differ 
from those of the other selected clones. Both clones contain 
LYS at position 19 which may restrict the interaction of BPTI 

25 with HNE. However, we can not exclude the possibility that 
other alterations within EpiNE2 and EpiNE8 (R15L and Y21S 
respectively) influence their affinity for HNE. 

EpiNE7 was expressed as a soluble protein and analyzed 
for HNE inhibition activity by the fluorometric assay of 

30 Castillo et al . (CAST79) ; the data were analyzed by the method 
of Green and Work (GREE53) . Preliminary results indicate that 
Kd (HNE, EpiNE7 ) s 8 . • 10" 12 M, i.e. at least 7.5-fold lower than 
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the lowest K<a reported for a BPTI derivative with restect to 
HNE. 

C . Summary 

Taken together, these data show that the alterations 
5 which appear in the PI region of the EPI mutants confer the 
ability to bind to HNE and hence be selected through the 
fractionation process. That the sequences of EpiNEl, EpiNE3 , 
and EpiNE7 appear frequently in the population of selected 
clones suggests that these clones display BPTI variants with 

10 the highest affinity for HNE of any of the 1000 potentially 

different variants in the MYMUT library. Furthermore, that pH 
conditions less than 4.0 are required to elute these fusion 
phage from immobilized HNE suggests that they display BPTI 
variants having a higher affinity for HNE than 

15 BPTI (K15V, R17L) . EpiNE7 exhibits a lower Ka toward HNE than 
does BPTI (K15V,R17L) ; EpiNEl and EpiNE3 should are also 
expected to exhibit lower KaS for HNE than BPTI (K15V, R17L) . It 
is possible that all of the listed EpiNEs have lower KdS than 
BPRI (K15V,R17L) . 

20 Position 18 has not previously been identified as a key 

position in determining specificity or affinity of aprotinin 
homologues or derivatives for particular serine proteases. 
None have reported or suggested that phenylalanine at position 
18 will confer specificity and high affinity for HNE. One of 

2 5 the powerful advantages of the present invention is that many 

diverse amino-acid sequences may be tested simultaneously. 

EXAMPLE V 

SCREENING OF THE MYMUT LIBRARY FOR BINDING TO CATHEPSIN G 

3 0 BEADS • 

We fractionated the MYMUT library over immobilized human 
Cathepsin G to find an engineered protease inhibitor having 
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high affinity for Cathepsin G, hereafter designated as an 
Epic. The details of phage binding, elution of bound phage 
with buffers of decreasing pH (pH profile) , titering of the 
phage contained in these fractions, composition of the MYMUT 
5 library, and the preparation of cathepsin G (Cat G) beads are 
essentially the same as detailed in Example IV. 

A pH profile for the binding of two starting controls, 
BPTI-III MK and EpiNEl, are shown in Figure 10. BPTI-III MK 
phage, which contains wild type BPTI fused to the III gene 

10 product, shows no apparent binding to Cat G beads in this 

assay. EpiNEl phage was obtained by enrichment with HNE beads 
(Example IV and Table 208) . EpiNEl -II I MK demonstrated little 
binding to Cat G beads in the assay, although a small peak or 
p shoulder is visible in the pH 5 eluted fraction. 

15 Figure 11 shows the pH profiles of the MYMUT library 

phage when bound to Cat G beads. Library-Cat G interaction 
was monitored using three cycles of binding, pH elution, 
transduction of the pH 2 eluted phage, growth of the 
transduced phage and rebinding of any selected phage to Cat G 

2 0 beads, in an exact copy of that used to find variants of BPTI 

which bound to HNE. In contrast to the pH profiles elicited 
with HNE beads, little enhancement of binding was observed for 
the same phage library when cycled with Cat G beads (with the 
exception of a possible 1 shoulder 1 developing in the pH5 
25 elutions) . 

To investigate the elution profile around the pH 5 point 
in more detail, the binding of phage taken from the pH 4 
eluted fraction (bound to Cat G beads) rather than the 
previously used pH 2 fraction was examined. Figure 12 

3 0 demonstrates a marked enhancement of phage binding to the Cat 

G beads with an apparent elution peak of pH 5. The binding, 
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as a fraction of the input phage population, increased with 
subsequent binding and elution cycles. 

Individual phage clones were picked, grown and analyzed 
for binding to Cat G beads. Figure 13 shows the binding and 
5 pH profiles for the individual Cat G binding clones 

(designated Epic variants) . All clones exhibited minor peaks, 
superimposed upon a gradual fall in bound phage, at pH 
elutions of 5 (clones 1 (SEQ ID NOs:54 and 117), 8 (SEQ ID 
NOs:56 and 119), 10 (SEQ ID NOs:57 and 120) and 11 (SEQ ID 

10 NOs:54 and 117)) or pH 4.5 (clone 7 (SEQ ID NOs:55 and 118)). 

DNA sequencing of the Epic clones, shown in Table 2 09 
(SEQ ID NOs:54 through 58 and 117 through 121), demonstrated 
that the clones selected for binding to Cat G beads 
represented a distinct subset of the available sequences in 

15 the MYMUT library and a cluster of sequences different from 

that obtained when enriched with HNE beads. The PI residue in 
the EpiC mutants is predominantly MET, with one example of 
PHE, while in BPTI it is LYS and in the EpiNE variants it is 
either VAL or LEU. In the EpiC mutants residue 16 is 

20 predominantly ALA with one example of GLY and residue 17 is 
PHE, ILE or LEU. Interestingly residues 16 and 17 appear to 
pair off by complementary size, at least in this small sample. 
The small GLY residue pairs with the bulky PHE while the 
relatively larger ALA residue pairs with the less bulky LEU 

25 and ILE. The majority of the available residues in the MYMUT 
library for positions 18 and 19 are represented in the EpiC 
variants . 

Hence, a distinct subset of related sequences from the 
MYMUT library have been selected for and demonstrated to bind 
30 to Cat G. A comparison of the pH profiles elicited for the 
EpiC variants with Cat G and the EpiNE variants for HNE 
indicates that the EpiNE variants have a high affinity for HNE 
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while the EpiC variants have a moderate affinity for Cat G. 
Nonetheless, the starting molecule, BPTI, has virtually no 
detectable affinity for Cat G and the selection of clones with 
a moderate affinity is a significant finding. 

5 

EXAMPLE VI 

SECOND ROUND OF VARIEGATION OF EpiNE7 TO ENHANCE BINDING TO 
HNE 

A. MUTAGENESIS OF EpiNE7 PROTEIN IN THE LOOP COMPRISING 

10 RESIDUES 34-41 

In Example IV, we described engineered protease 
inhibitors EpiNEl through EpiNE8 (SEQ ID NOs:46 through 53 and 
109-116) that were obtained by affinity selection. Modeling 
of the structure of the BPTI -Trypsin complex (Brookhaven 

15 Protein Data Bank entry 1TPA) indicates that the EpiNE protein 
surface that interacts with HNE is formed not only by residues 
15-19 but also by residues 34-40 that are brought close to 
this primary loop when the protein folds (HUBE74 , HUBE75, 
OAST88) . Acting upon this assumption, we changed amino acid 

20 residues in a second loop of the EpiNE 7 protein to find EpiNE 7 
(SEQ ID NO: 48) derivatives having higher affinity for HNE. 

In the complex of BPTI and trypsin found in Brookhaven 
Protein Data Bank entry 1TPA ("1TPA complex") , VAL 34 contacts 
TYR isi and GLN i 92 . (Residues in trypsin or HNE are underscored 

25 to distinguish them from the inhibitor.) In HNE, the 

corresponding residues are ILE i 51 and PHE i 92 . ILE is smaller 
and more hydrophobic than TYR. PHE is larger and more 
hydrophobic than GLN. Neither of the HNE side groups have the 
possibility to form hydrogen bonds. When side groups larger 

30 than that of VAL are substituted at position 34, interactions 
with residues other than 151 and 192 may be possible. In 
particular, an acidic residue at 34 might interact with ARG 147 
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of HNE that corresponds to SER 147 of trypsin in 1TPA. Table 15 
shows that, in 5 9 homologues of BPTI , 13 different amino acids 
have been seen at position 34. Thus we allow all twenty amino 
acids at 34. 

5 Position 36 is not highly varied; only GLY, SER, and ARG 

have been observed with GLY by far the most prevalent. In the 
1TPA complex, GLY 36 contacts HIS 57 and GLNi 92 . HIS57 is 
conserved and GLNi 92 corresponds to PHEi 92 of HNE. Adding a 
methyl group to GLY 36 could increase hydrophobic interactions 

10 with PHE 192 of HNE. GLY 36 is in a conformation that most amino 
acids can achieve: <f> = -79° and \p = -9° (Deisenhof f er cited in 
CREI84 , p. 222 . ) . 

In the 1TPA complex, ARG 39 contacts SER 96/ ASN 97 , THR 98 , 
LEU 99 (SEQ ID NO:13), GLN i 75 , and TRP215 • In HNE, all of the 

15 corresponding residues are different! SER 96 is deleted; ASN 97 
corresponds to ASP 97 (bearing a negative charge) ; THR 98 
corresponds to PRQ 98 ; LEU 99 corresponds to the residues VAL 99 , 
ASN 99a/ and LEU 99b ; GLN 175 is deleted; and TRP 2 is corresponds to 
PHE215. Position 3 9 shows a moderately high degree of 

20 variability with 7 different amino acids observed, viz . ARG, 
GLY, LiYS, GLN, ASP, PRO, and MET. Having seen PRO (the most 
rigid amino acid) , GLY (the most flexible amino acid) , LYS and 
ASP (basic and acidic amino acids) , we assume that all amino 
acids are structurally compatible with the aprotinin backbone. 

25 Because the context of residue 39 has changed so much, we 
allow all 20 amino acids. 

Position 4 0 is not highly variable; only GLY and ALA have 
been observed (with similar frequency, 24:16). Position 41 is 
moderately varied, showing ASN, LYS, ASP, GLN, HIS, GLU, and 

3 0 TYR. The side groups of residues 4 0 and 41 are not thought to 
contact trypsin in the 1TPA complex. Nevertheless, these 
residues can exert electrostatic effects and can influence the 
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dynamic properties of residues 39, 38, and others. The choice 
of residues 34, 36, 39, 40, and 41 to be varied simultaneously 
illustrates the rule that the varied residues should be able 
to touch one molecule of the target material at one time or be 
5 able to influence residues that touch the target. These 
residues are not contiguous in sequence, nor are they 
contiguous on the surface of EpiNE7 . They can, nonetheless, 
all influence the contacts between the EpiNE and HNE. 

Amino acid residues VAL 34 , GLY 36 , MET 39 , GLY 40 , and ASN 4i 

10 were variegated as follows: any of 20 genetically encodable 
amino acids at positions 34 and 3 9 (NNS codons in which N is 
approximately equimolar A,C,T,G and S is approximately 
equimolar C and G) , GLY or ALA at position 36 and 40 (GST 
codon) , and [ASP, GLU, HIS, LYS, ASN, GLN, TYR, or stop] at 

15 position 41 (NAS codon) . Because the PEPIs are displayed 
fused to gill protein, DNA containing stop codons will not 
give rise to infectuous phage in non- suppressor hosts. 

For cassette mutagenesis, a 61 base long oligonucleotide 
DNA population was synthesized that contained 32,768 different 

2 0 DNA sequences coding on expression for a total of 11,20 0 amino 

acid sequences. This oligonucleotide extends from the third 
base of codon 51 in Table 113 (the middle of the Stu I site) to 
base 2 of codon 70 (the Eag I site (identified as Xmalll in 
Table 113) ) . 

25 We used a mutagenesis method similar to that described by 

Cwirla et al . (CWIR90) and other standard DNA manipulations 
described in Maniatis et al . (MANI82) and Sambrook et al . 
(SAMB89) . EpiNE7 RF DNA was restricted with Eag I and StuI, 
agarose gel purified, and dephosphorylated using HK (TM) 

3 0 phosphatase (Epicentre Technologies) . We prepared insert by 

annealing two small, 16 base and 17 base, phosphorylated 
synthetic DNA primers to the phosphorylated 61 base long 
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oligonucleotide population described above. The resulting 
insert DNA population had the following features: double 
stranded DNA ends capable of regenerating upon ligation the 
EagI (5' overhang) and StuI (blunt) restricted sites of the 
5 EpiNE7 RF DNA, and single stranded DNA in the central 

mutagenic region. Insert and EpiNE7 vector DNA were ligated. 
Ligation samples were used to transfect competent XLl-Blue (TM) 
cells which were subsequently plated for formation of 
ampicillin resistant (Ap R ) colonies. The resulting phage- 

10 producing, Ap R colonies were harvested and recombinant phage 
was isolated. By following these procedures, a phage library 
of 1.2 -10 5 independent transf ormants was assembled. We 
estimated that 97.4% of the approximately 3.3-10 4 possible DNA 
sequences were represented: 

15 0.974 = (1 - exp{-1.2 -10 5 /32768}) . 

The probability of observing the parental sequence is higher 
than .974 because VAL occurs twice in the NNS codon: 



Probability of seeing (V 34 , G 36 , M 39/ G 40 , N 4i ) = 
20 (1 - exp{ - (1.2-10 5 x 2/32768) } 

= (1 - exp{ - 7.32}) 
= (1 - 6.5-10" 4 ) 
= 0.99934 

Furthermore, we expect that a small amount (for example, 1 
25 part in 1000) of uncut or once-cut and religated parental 
vector would come through the procedures used. Thus the 
parental sequence is almost certainly present in the library. 
This library is designated the KLMUT library. 
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B, AFFINITY SELECTION WITH IMMOBILIZED HUMAN NEUTROPHIL 
ELASTASE 

1) First Fractionation 

We added 1.1 -10 s plaque forming units of the KLMUT library 
5 to 10 /xl of a 50% slurry of agarose -immobilized human 

neutrophil elastase beads (HNE from Calbiochem cross-linked to 
React i -Gel (TM) agarose beads from Pierce Chemical Co. following 
manufacturer's directions) in TBS/BSA. Following 3 hours 
incubation at room tempera ture, the beads were washed and 

10 phage was eluted as done in the selection of EpiNE phage 

isolates (Example IV) . The progression in lowering pH during 
the elution was: pH 7.0, 6.0, 5.0, 4.5, 4.0, 3.5, 3.0, 2.5, 
and 2.0. Beads carrying phage remaining after pH 2.0 elution 
were used to infect XLl-Blue tTM) cells that were plated to allow 

15 plaque formation. The 348 resulting plaques were pooled to 
form a phage population for further affinity selection. A 
population of phage particles containing 6.0-10 8 plaque forming 
units was added to 10 /xl of a 50% slurry of agarose- 
immobilized HNE beads in TBS/BSA and the above selection 

2 0 procedure was repeated. 

Following this second round of affinity selection, a 
portion of the beads was mixed with XLl-Blue (TM) cells and 
plated to allow plaque formation. Of the resulting plaques, 
480 were pooled to form a phage population for a third 
25 affinity selection. We repeated the selection procedure 
described above using a population of phage particles 
containing 3.0-10 9 plaque forming units. Portions of the pH 
2.0 eluate and of the beads were plated with XLl-Blue (TM) cells 
to allow formation of plaques. Individual plaques were picked 

3 0 for preparation of RF DNA. From DNA sequencing, we determined 

the amino acid sequence in the mutated secondary loop of 15 
EpiNE7- homo log clones. The sequences are given in Table 210 
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as EpiNE7.1 through EpiNE7.20 (SEQ ID NOs:59-70). Three 
sequences were observed twice: EpiNE7.4 and EpiNE7 . 14 (SEQ ID 
NO:63); EpiNE7 . 8 and EpiNE7 . 9 (SEQ ID NO:60); and EpiNE7.10 
and EpiNE7.20 (SEQ ID NO:65). EpiNE7 . 4 was eluted at pH 2 
5 while EpiNE7.14 was obtained by culturing HNE beads that had 
been washed with pH 2 buffer. Similarly, EpiNE7.10 came from 
pH 2 elution but EpiNE7.20 came from beads. EpiNE7 . 8 and 
EpiNE7 . 9 both came from pH 2 elution. Interestingly, EpiNE7 . 8 
is found in both the first and second fractionations 

10 (EpiNE7.31 ( vide infra ) ) . 
2) Second Fractionation 

The purpose of affinity fractionation is to reduce 
diversity on the basis of affinity for the target. The first 
enrichment step of the first fractionation reduced the 

15 population from 3-10 4 possible DNA sequences to no more than 
348. This might be too severe and some of the loss of 
diversity might not be related to affinity. Thus we carried 
out a second fractionation of the entire KLMUT library seeking 
to reduce the diversity more gradually. 

20 We added 2.0-10 11 plaque forming units of the KLMUT 

library to 10 fj.1 of a 50% slurry of agarose -immobilized HNE 
beads in TBS/BSA. Following 3 hours incubation at room 
temperature, phage were eluted as described above. We then 
transduced XLl-Blue (TM> cells with portions of the pH 2 . 0 eluate 

25 and plated for Ap R colonies. 

The resulting phage -producing colonies were harvested to 
obtain amplified phage for further affinity selection. A 
population of these phage particles containing 2.0-10 10 plaque 
forming units was added to 10 /il of a 50% slurry of agarose- 

30 immobilized HNE beads in TBS/BSA and incubated for 90 minutes 
at room temperature. Phage were eluted as described above and 
portions of the pH 2 . 0 eluate were used to transduce XL1- 
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Blue (TM> cells. We plated the transductants for Ap R colonies 
and obtained amplified phage from the harvested colonies. 

In a third round of affinity selection, a population of 
phage particles containing 3.0-10 10 plaque forming units was 
5 added to 20 Ail of 50% slurry of agarose -immobilized HNE beads 
and incubated for 2 hours at room temperature. We eluted the 
phage with the following pH washes: pH 7.0, 6.0, 5.0, 4.5, 
4.0, 3.5, 3.25, 3.0, 2.75, 2.5, 2.25, and 2.0. After plating 
a portion of the pH 2.0 eluate fraction for plaque formation, 

10 we picked individual plaques for preparation of RF DNA. DNA 
sequencing yielded the amino acid sequence in the mutated 
secondary loop for 20 EpiNE7 homolog clones. These sequences, 
together with EpiNE7 (SEQ ID NO: 48) , are given in Table 210 as 
EpiNE7.21 through EpiNE7.4 0 (SEQ ID NOs:71 through 87) . The 

15 plaques observed when EpiNEs are plated display a variety of 

sizes. EpiNE7.21 through EpiNE7.30 (SEQ ID NOs:71 through 80) 
were picked with attention to plaque size: 7.21, 7.22, and 
7.23 from small plaques, 7.24 through 7.3 0 from plaques of 
increasing size, with 7.3 0 coming from a large plaque. TRP 

20 occurs at position 39 in EpiNE7.21, 7.22, 7.23, 7.25, and 

7.30. Thus plaque size does not correlate with the appearance 
of TRP at 39. One sequence, EpiNE7.31, from this 
fractionation is identical to sequences EpiNE7 . 8 and EpiNE7 . 9 
obtained in the first fractionation. EpiNE7.30, EpiNE7.34, 

25 and EpiNE7.35 are identical, indicating that the diversity of 
the library has been greatly reduced. It is believed that 
these sequences have an affinity for HNE that is at least 
comparable to that of EpiNE7 and probably higher. Because the 
parental EpiNE7 sequence did not recur, it is quite likely 

30 that some or all of the EpiNE7.nn derivatives have higher 
affinity for HNE than does EpiNE7 . 
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3) Conclusions 

One can draw some conclusions. First, because some 
sequences have been isolated repeatedly, the fractionation is 
nearly complete. The diversity has been reduced from slO 4 to a 
5 few tens of sequences. 

Second, the parental sequence has not recurred. At 39, 
MET did not occur! At position 34 VAL occurred only once in 

35 sequences. At 41, ASN occurred only 4 of 3 5 times. At 40, 
GLY occurred 17 of 35 times. At position 36, GLY occurred 34 

10 of 35 times, indicating that ALA is undesirable here. 

EpiNE7.24 (SEQ ID NO: 74) and EpiNE7.36 (SEQ ID NO: 83) are most 
like EpiNE7 (SEQ ID NO: 48) , having three of the varied 
residues identical to EpiNE7 . 

Third, the results of the first and second fractionation 

15 are similar. In the second fractionation, the prevalence of 
TRP at position 39 is more marked (5/15 in fractionation #1, 
14/20 in #2) . It is possible that the first fractionation 
lost some high-affinity EPIs through under-sampling. 
Nevertheless, the first fractionation was clearly quite 

20 successful. 

Fourth, there are strong preferences at positions 3 9 and 

36 and lesser but significant preferences at positions 34 and 
41 with little preference at 40. 

Heretofore, no homologues of aprotinin have been reported 
25 having ALA at 36. In the selected EpiNE7.nn sequences, the 
preference for GLY over ALA at position 36 is 34:1. This 
preference is probably not due to differences in protein 
stability. The process of the present invention, as applied 
in the present example, does not select against proteins on 
30 the basis of stability so long as the protein does fold and 
function at the temperature used in the procedure. ALA is 
probably tolerated at position 3 6 well enough to allow those 
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proteins having ALA 36 to fold and function; one example was 
found having ALA 36 . It may be relevant that the sole sequence 
having ALA 3 6 also has GLY 34 . The flexibility of GLY at 34 may 
allow the methyl of ALA at 36 to fit into HNE in a way that is 
5 not possible when other amino acids occupy position 34 . 

At position 39, all 2 0 amino acids were allowed, but only 
seven were seen. TRP is strongly preferred with 19 
occurrences, HIS second with six occurences, and LEU third 
with 5 occurrences. No homologues of aprotinin have been 

10 reported having either TRP or HIS at position 39 as are now 
disclosed. Although LEU is represented in the NNS codon 
thrice, TRP and HIS have but one codon each and their 
prevalence is surprising. We constructed a model having HNE 
(Brookhaven Protein Data Bank entry 1HNE) and EpiNE7 . 9 (SEQ ID 

15 NO: 60) spatially related as in the 1TPA complex. (The a 
carbons of HNE of conserved internal residues were 
superimposed on the corresponding a carbons of trypsin, rms 
deviation «0 . 5 A.) Inspection of this model indicates that 
TRP 39 could interact with the loop of HNE that comprises VAL 99 , 

20 ASN 99a , and LEU 99b - HIS is observed in six cases; HIS is 

hydrophobic, aromatic, and in some ways similar to TRP. LEU 39 
in EpiNE7.5 could also interact with these residues if the 
loop moves a short distance. GLU occurred twice while LYS, 
ARG, and GLN occurred once each. In BPTI , the C a of residue 3 9 

25 is «10 A from the C ff of residue 15 so that TRP 39 interacts with 
different features of HNE than do the amino acids substituted 
at position 15. Residue 34 is well separated from each of the 
residues 15, 18, and 39; thus it contacts different features 
on the HNE surface from these residues. Although serine 

30 proteases are highly similar near the catalytic site, the 

similarity diminishes rapidly outside this conserved region. 
The specificity of serine proteases is in fact determined by 
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more interactions than the PI residue. To make an inhibitor 
that is highly specific to HNE , we must go beyond matching the 
requirement at PI. Thus, the substitutions at 18 (determined 
in Example IV), 39, 34, and other non-Pi positions are 
5 invaluable in customizing the EpiNE to HNE. When making an 
inhibitor customized to a different serine protease, it is 
likely that many, if not all, of these positions will be 
changed to obtain high affinity and specificity. It is a 
major advantage of the present method that many such 

10 derivatives may be tested rapidly. 

At position 34, all 20 amino acids were allowed. 
Fourteen have been seen. LYS appeared seven times, GLU five 
times, THR four times, LEU three times, GLY, ASP, GLN, MET, 
ASN, and HIS twice each, and ARG, PRO, VAL, and TYR once each. 

15 There were no instances of ALA, CYS, PHE, ILE, SER, or TRP . 

No homologue of aprotinin with GLU, GLY, or MET at 34 has been 
reported heretofore. Here, as at position 39, the library 
contains an excess of LEU over LYS and GLU. Thus, we infer 
that the prevalence of LYS, GLU, THR, and LEU is related to 

2 0 tighter binding of EpiNEs having these amino acids at position 
34. The prevalence of LYS is surprising, as there are no 
acidic groups on HNE in the neighborhood. The N ze ta of LYS 34 
could interact with a main-chain carbonyl oxygen while the 
methylene groups interact with ILE 151 and/or PHEi 92 . LEU 34 could 

2 5 interact with ILE isi and/or PHE 192 while GLU 34 could interact 

with ARG 147 . 

There has been little if any enrichment at positions 40 
and 41. Alanine is somewhat preferred at 40; ALA : GLY : : 1 8 : 1 7 . 
Both ALA and GLY have been reported in aprotinin homologues . 

3 0 Position 41 shows a preponderance of LYS (12 occurrences) 

and GLU (7) , but all eight possibilities have been seen. The 
overall distribution is LYS 12 , GLU 7 , ASP 4 , ASN 4 , GLN 3 , HIS 3 , and 
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TYR 2 . Heretofore, no homologues of aprotinin having GLU, GLN, 
HIS, or TYR at position 41 have been reported. 

One sequence, EpiNE7.2 5 (SEQ ID NO: 75) contains an 
unexpected change at position 47, SER to LEU. Heretofore, all 
5 homologues of aprotinin reported have had either SER or THR at 
position 47. The side groups of SER and THR can form hydrogen 
bonds to main- chain atoms at the beginning of the short a 
helix. 

The consensus sequence, LYS 34 , GLY 36 , TRP 39/ ALA40, LYS 4i was 
10 not observed. EpiNE7.23 (SEQ ID NO: 73) is quite close, 

differing only at position 4 0 where the preference for ALA is 
very, very weak. 

We tested EpiNE7.23 (the sequence closest to consensus) 
against EpiNE7 (SEQ ID NO: 48) on HNE beads. Figure 16 shows 
15 the fractionation of strains of phage that display these two 
EpiNEs. Phage that display EpiNE7 are eluted at higher pH 
than are phage that display EpiNE7.23. Furthermore, more of 
the EpiNE7.23 phage are retained than of the EpiNE7 phage. 
Note the peak at pH 2.25 in the EpiNE7.23 elution. This 
20 suggests that EpiNE7.23 has a higher affinity for HNE than 
does EpiNE7. In a similar way, we tested EpiNE7.4 (SEQ ID 
NO: 63) and found that it is not retained on HNE so well as 
EpiNE7 . This is consistent with the fractionation not being 
complete . 

25 Further fractionation, characterization of clonally pure 

EpiNE7.nn strains, and biochemical characterization of soluble 
EpiNE7.nn derivatives will reveal which sequences in this 
collection have the highest affinity for HNE. 

Fractionation of the library involves a number of 

3 0 factors. Differential binding allows phage that display PBDs 
having the desired binding properties to be enriched. 
Differences in infectivity, plaque size, and phage yield are 
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related to differences in the sequence of the PBDs, but are 
not directly correlated to affinity for the target. These 
factors may reduce the effectiveness of the desired 
fractionation. An additional factor that may be present is 
5 differential abundance of PBD sequences in the initial 
library. One step we employ to reduce the effect of 
differential infectivity is to transduce cells with isolated 
phage rather than to infect them. In the first fractionation, 
we did not obtain sufficient material for transduction and so 

10 infected cells; this fractionation was successful. Because 

the parental sequence, EpiNE7, was selected for a sequence at 
residues 15 through 19 that confer high affinity for HNE , we 
believe that many, if not most, members of the KLMUT 
population have significant affinity for HNE. Thus the 

15 present fractionations must separate variants having very high 
affinity for HNE from those merely having high affinity for 
HNE. It is perhaps relevant that BPTI-III MK phage are only 
partially eluted from immobilized trypsin at pH 2.2.; 
Kd (trypsin, BPTI) = 6.0-10" 14 M. Elution of EpiNE7-III MA phage 

20 from immobilized HNE gives a peak at about pH 3.5 with some 
phage appearing at lower pH; Kd (HNE, EpiNE7) <s 1.-10" 11 M. We 
recycled phage that either were eluted at pH 2 . 0 or that were 
retained after elution with pH 2.0 buffer. A large percentage 
of EpiNE7-III MA phage would have been washed away with the 

25 fractions at pHs less acid than 2.0. This, together with the 
marked preferences at positons 39, 36, and 34, strongly 
sugestes that we have successfully fractionated the KLMUT 
library on the basis of affinity for HNE and that the 
EpiNE7.nn proteins have higher affinity for HNE than does 

30 EpiNE7 or any other reported aprotinin derivative. 

Fractionation in a few stringent steps emphasizes the 
affinity of the PBD and allows isolation of variants that 
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confer a small-plaque phenotype on cells (through low 
infectivity or by slowing cell growth) . More gradual 
fractionation allows observation of a wider variety of 
variants that show high affinity and favors sequences that 
5 start at low abundance. Gradual fractionation also favors 
selection of variants that do not confer a small -plaque 
phenotype; such variants may be easier to work with and are 
preferred for some purposes. In either case, it is preferred 
to fractionate until there is a manageable number of distinct 

10 isolates and to characterize these isolates as pure clones. 

Thus, it is desirable, in most cases, to fractionate a library 
in more than one way. 

None have identified positions 3 9 and 34 as key in 
determining the affinity and specificity of aprotinin 

15 homologues and derivatives for particular serine proteases. 
None have suggested the tryptophan at 3 9 or charged amino 
acids (LYS or GLU) at 34 c )will enhance binding of an aprotinin 
homologue to HNE . Different substitutions at these positions 
is likely to confer different specificity on those 

20 derivatives. One of the major advantages of the present 

invention is that many substitutions at several locations may 
be tested with an amount of effort not much greater than is 
required to test a single derivative by previously used 
methods . 

2 5 There exist a number of proteases produced by 

lymphocytes. Neutrophil elastase is not the only lymphocytic 
protease that degrades elastin. The protease p29 is related 
to HNE. Screening the MYMUT and KLMUT libraries against 
immobilized p29 is likely to allow isolation of an aprotinin 

30 derivative having high affinity for p29. 
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EXAMPLE VII 

BPTI: VIII BOUNDARY EXTENSIONS. 

The aim of this work was to introduce peptide extensions 
between the C- terminus of the BPTI domain and the N- terminus 
5 of the M13 major coat protein within the fusion protein. The 
reasons for this were two fold; firstly to alter potential 
protease cleavage sites at the interdomain boundary (as 
evidenced by an apparent instability of the fusion protein) 
and secondly to increase interdomain flexibility. 
10 1)_ Insertion of a variegated pentapeptide at the BPTI:V1II 
interface . 

The gene shown in Table 113 was modified by insertion of 
five RVT codons between codon 81 and 82. Two synthetic 
oligonucleotides were designed and custom synthesized. The 

15 first consisted of, from 5 1 to 3': a) from base 2 of codon 77 
to the end of codon 81, b) five copies of RVT, and c) from 
codon 82 to the second base of codon 94. The second comprised 
20 bases complementary to the 3 1 end of the first 
oligonucleotide. Each RVT codon allows one of the amino acids 

20 [T, N, S, A, D, and G] to be encoded. This variegation codon 
was picked because: a) each amino acid occurs once, and b) all 
these amino acids are thought to foster a flexible linker. 
When annealed, the primed variegated oligonucleotide was 
converted to double- stranded DNA using standard methods. 

25 

The duplex was digested with restriction enzymes Sf i l and 
Narl and the resulting 45 base-pair fragment was ligated into 
a similarly cleaved OCV, M13MB48 (Example I.l.iii.a). The 
ligated material was transfected into competent coli cells 
30 (strain XL1- Blue <TM) ) and plated onto a lawn of the same cells 
on normal bacterial growth plates to form plaques. The 
bacteriophage contained within the plaques were analyzed using 
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standard methods of nitrocellulose lifts and probing using a 
32 P- labeled oligonucleotide complementary to the DNA sequence 
encoding the fusion protein interface. Approximately 80% of 
the plaques probed poorly with this oligonucleotide and hence 
5 contained new sequences at this position. 

A pool of phages, containing the novel interface 
pentapeptide extensions, was collected by combining the phage 
extracted from the plated plaques. 

2 . Adding multiple unit extensions to the fusion protein 
10 interface . 

The M13 gene III product contains 1 stalk-like ' regions as 
implied by electron micrographic visualization of the 
bacteriophage (LOPE85) . The predicted amino acid sequence of 
this protein contains repeating motifs, which include: 

15 glu.gly .gly .gly . ser (EGGGS) (SEQ ID NO: 10) seven times 
gly .gly .gly . ser (GGGS) (SEQ ID NO: 14) three times 
glu.gly .gly. gly . thr (EGGGT) (SEQ ID NO:15) once. 

The aim of this section was to insert, at the domain 
interface, multiple unit extensions which would mirror the 

20 repeating motifs observed in the III gene product. 

Two synthetic oligonucleotides were designed and custom 
synthesized. GLY is encoded by four codons (GGN) ; when 
translated in the opposite direction, these codons give rise 
to THR, PRO, ALA, and SER. The third base of these codons was 

25 picked so that translation of the oligonucleotide in the 

opposite direction would encode SER. When annealed the 

synthetic oligonucleotides give the following unit duplex 

sequence (an EGGGS linker) : 

EGGGS (SEQ ID NO: 10) 
30 5' C.GAG.GGA.GGA.GGA.TC 3' (SEQ ID NO:100) 

3' TC.CCT.CCT.CCT.AGG.C 5' (SEQ ID NO:101) 

(L) (S) (S) (S) (G) (SEQ ID NO:261) 
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The duplex has a common two base pair 5' overhang (GC) at 
either end of the linker which allows for both the ligation of 
multiple units and the ability to clone into the unique Nar l 
recognition sequence present in OCV ! s M13MB48 and Gem MB42 . 
5 This site is positioned within 1 codon of the DNA encoding the 
interface. The cloning of an EGGGS linker (SEQ ID NO: 10) (or 
multiple linker) into the vector Nar l site destroys this 
recognition sequence. Insertion of the EGGGS linker (SEQ | ID 
NQ^IQ^L i n reverse orientation leads to insertion of GSSSL 
10 (SEQ ID NO: 16) into the fusion protein. 

Addition of a single EGGGS linker ^Sg£^^^ at the 

Nar l site of the gene shown in Table 113 leads to the 
following gene: 



15 79 80 80a 80b 80c 80d 80e 81 82 83 84 

GGEG GGSAAEG (SEQ ID NO: 17) 

GGT . GGC . GAG . GGA . GGA . GGA . TCC . GCC . GCT . GAA . GGT (SEQ ID NO:102) 



Note that there is no preselection for the orientation of 
the linker (s) inserted into the OCV and that multiple linkers 
of either orientation (with the predicted EGGGS (SEQ ID NO: 10) 




2 5 orientations (inverted repeats of DNA) could occur. 

A ladder of increasingly large multiple linkers was 
established by annealing and ligating the two starting 
oligonucleotides containing different proportions of 5 1 
phosphorylated and non-phosphorylated ends. The logic behind 

30 this is that ligation proceeds from the 3 1 unphosphorylated 
end of an oligonucleotide to the 5 1 phosphorylated end of 
another. The use of a mixture of phosphorylated and non- 
phosphorylated oligonucleotides allows for an element of 
control over the extent of multiple linker formation. A 
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ladder showing a range of insert sizes was readily detected by 
agarose gel electrophoresis spanning 15 bp (1 unit duplex- 5 
amino acids) to greater than 600 base pairs (40 ligated 
linkers-2 00 amino acids) . 
5 Large inverted repeats can lead to genetic instability. 

Thus we chose to remove them, prior to ligation into the OCV, 
by digesting the population of multiple linkers with the 
restriction enzymes AccIII or Xhol , since the linkers, when 
ligated 1 head-to-head ■ or 1 tail -to-tail ' , generate these 

10 recognition sequences. Such a digestion significantly reduces 
the range in sizes of the multiple linkers to between 1 and 8 
linker units ( i.e. between 5 and 40 amino acids in steps of 
5), as assessed by agarose gel electrophoresis. 

The linkers were ligated (as a pool of different insert 

15 sizes or as gel -purified discrete fragments) into Nar l cleaved 
OCVs M13MB48 or GemMB42 using standard methods. Following 
ligation the restriction enzyme Nar l was added to remove the 
self -ligating starting OCV (since linker insertion destroys 
the Nar l recognition sequence) . This mixture was used to 

2 0 transform competent XL-1 blue cells and appropriately plated 
for plaques (OCV M13MB48) or ampicillin resistant colonies 
(OCV GemMB4 2) . 

The transf ormants were screened using dot blot DNA 
analysis with one of two 32 P labeled oligonucleotide probes. 

2 5 One probe consisted of a sequence complementary to the DNA 

encoding the PI loop of BPTI while the second had a sequence 
complementary to the DNA encoding the domain interface region. 
Suitable linker candidates would probe positively with the 
first probe and negatively or poorly with the second. Plaque 

3 0 purified clones were used to generate phage stocks for binding 

analyses and BPTI display while the Rf DNA derived from phage 
infected bacterial cells was used for restriction enzyme 
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analysis and sequencing. Representative insert sequences of 
selected clones analyzed are as follows: 

M13.3X4 (GG)C.GGA.TCC.TCC.TCC.CT(C.GCC) (SEQ ID NO:103) 
5 gly ser ser ser leu 

(AA 6-10 of SEQ ID NO: 11 = SEQ ID NO:150) 

M13.3X7 (G C.GAG.GGA.GGA.GGA.TC (C.GCC) (SEQ ID NO:104) 
glu gly gly gly ser (SEQ ID NO: 10) 

10 

M13 . 3X11 (GG) C . GAG . GGA . GGA . GGA . TCC . GGA . TCC . TCC . . £&E£k 

glu gly gly gly ser gly ser ser (tfirgfjftrr 

15 

TCC. CTC. GGA. TCC. TCC. TCC. CT (C.GCCC) (SEQ ID 

NO:105) 

ser leu gly ser ser ser leu 

(SEQ ID NO: 18) 

20 

These highly flexible oligomeric linkers are believed to be 
useful in joining a binding domain to the major coat (gene 
VIII) protein of filamentous phage to facilitate the display 
of the binding domain on the phage surface. They may also be 
25 useful in the construction of chimeric OSPs for other genetic 
packages as well. 
EXAMPLE VIII 

BACTERIAL EXPRESSION VECTORS • 

The expression vectors were designed for the bac terial 
3 0 production of BPTI analogues resulting from the mutagenesis 
and screening for variants with specific binding properties. 
The expression vectors used are derivatives of the OCV f s 
M13MB48 and GemMB42 . The conversion was achieved by replacing 
the first codon of the mature VIII gene (codon 82 as shown in 
35 Table 113) with a translational stop codon by site specific 
mutagenesis . 
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The salient points of the expression vector composition 
are identical to that of the parent OCV's, namely a lacUVS 
promoter (hence IPTG induction) , ribosome binding site, 
initiating methionine, pho A signal peptide and 
5 transcriptional termination signal (see Table 113) . The 

placement of the stop codon allows for the expression of only 
the first half the fusion protein. The Gem-based expression 
system, containing the genes encoding BPTI analogues, is 
stored as plasmid DNA, being freshly transfected into cells 

10 for expression of the analogue protein. The M13 -based 
expression system is stored as both RF DNA and as phage 
stocks. The phage stocks are used to infect fresh bacterial 
cells for expression of the protein of interest. 
Bacterial Expression of BPTI and Analogues. 

15 i. Gem-based expression vector and protocol. 

The gem-based expression vector is a derivative of the 
OCV GemMB42 (Eample I and Table 113) . This vector, at least 
when it contains the BPTI or analogue genes, has demonstrated 
a degree of insert instability on prolonged growth in liquid 

20 culture. To reduce the risk of this the following protocol is 
used. 

Expression vector DNA (containing the BPTI or analogue 
gene) is transfected into the coli strain, XLl-Blue (TM) , 
which is plated on bacterial plates containing ampicillin and 

2 5 allowed to incubate overnight at 37 °C to give a dense 

population of colonies. The colonies are scraped from the 
plate with a glass spreader in 1ml of NZCYM medium and 
combined with the scraped cells from other duplicate plates. 
This stock of cells is diluted approximately one hundred fold 

30 into NZCYM liquid medium containing ampicillin (100/ig per ml) 
and allowed to grow in a shaking incubator to a cell density 
of approximately half log (absorbance of 0.3 at 600nm) . IPTG 
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is added to a final concentration of 0.5 mM and the induced 
culture allowed to grow for a further two hours when it is 
processed as described below. 

ii. M13 -based expression vector and protocol. 
5 The M13 -based expression vector is derived from OCV 

M13MB48 (Example I) - The BPTI gene (or analogue) is contained 
within the intergenic region and its transcription is under 
the control of a lacUVS promoter, hence IPTG inducible. The 
expression vector, containing the gene of interest, is 

10 maintained and utilized as a phage stock. This method enables 
a potentially lethal or deleterious gene to be supplied to a 
bacterial culture and gene induction to occur only when the 
bacterial culture has achieved sufficient mass. Poor growth 
and insert instability can be circumvented to a large extent, 

15 giving this system an advantage over the Gem-based vector 
described above . 

An overnight bacterial culture of XLl-Blue (TM> or SEF 1 is 
grown in LB medium containing tetracycline (50 /ig per ml) to 
ensure the presence of pili as sites for bacteriophage binding 

20 and infection. This culture is diluted 100-fold into NZCYM 

medium containing tetracycline and bacterial growth allowed to 
proceed in an incubator shaker until a cell density of 1.0 (Ab 
6 0 0nm) has been achieved. Phage, containing the expression 
vector and gene of interest, are added to the bacterial 

25 culture at a multiplicity of infection (MOI) of 10 and allowed 
to infect the cells for 30 minutes. Gene expression is then 
induced by the addition of IPTG to a final concentration of 
0.5 mM and the culture allowed to grow overnight. Media 
collection and cell fractionation is as described elsewhere. 

3 0 Bacterial Cell Fractionation. 

After heterologous gene expression the bacterial cell 
culture can be separated into the following fractions: 
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conditioned medium, periplasmic fraction and post-periplasmic 
cell lysate. This is achieved using the following procedures. 

The culture is centrifuged to pellet the bacteria, 
allowing the supernatant to be stored as conditioned medium. 
5 This fraction contains any exported proteins. The pellet is 
taken up in 2 0% sucrose, 3 0mM Tris pH 8 and ImM EDTA (80 ml of 
buffer per gram of fresh weight pellet) and allowed to sit at 
room temperature for 10 minutes. The cells are repelleted and 
taken up in the same volume of ice cold 5mM MgS0 4 and left on 
10 ice for 10 minutes. Following centrif ugation, to pellet the 
cells, the supernatant (periplasmic fraction) is stored. A 
second round of osmotic shock fractionation can be undertaken 
if desired. 

The post-periplasmic pellet can be further lysed as 
15 follows. The pellet is resuspended in 1 . 5 ml of 20% sucrose, 
40 mM Tris pH 8, 50mM EDTA and 2.5 mg of lysozyme (per gram 
fresh weight of starting pellet) . After 15 minutes at room 
temperature 1.15 ml of 0.1% Triton X is added together with 
300 ill of 5M NaCl and incubated for a further 15 minutes. 2.5 
20 ml of 0.2 M triethanolamine (pH 7.8), 150 ill of 1M CaCl 2 , 100 
M l of 1M MgCl 2 and 5 Mg of DNA ! se are added and allowed to 
incubate, with end-over-end mixing, for 20 minutes to reduce 
viscosity. This is followed by centrif ugation with the 
supernatant being retained as the post- periplasmic lysate. 

25 

The present invention is not, of course, limited to any 
particular expression system, whether bacterial or not. 
EXAMPLE IX 

CONSTRUCTION OF AN ITI -DOMAIN I/GENE III DISPLAY VECTOR 

3 0 1 . ITI domain I as an IPBD 

Inter-a-trypsin inhibitor (ITI) is a large (M r ca 240,000) 
circulating protease inhibitor found in the plasma of many 
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mammalian species (for recent reviews see ODOM90, SALI90, 
GEBH90 , GEBH86) . The intact inhibitor is a glycoprotein and 
is currently believed to consist of three glycosylated 
subunits that interact through a strong glycosaminoglycan 
5 linkage (ODOM90, SALI90, ENGH89, SELL87) . The anti-trypsin 
activity of ITI is located on the smallest subunit (ITI light 
chain, unglycosylated M r ca 15,000) which is identical in amino 
acid sequence to an acid stable inhibitor found in urine (UTI) 
and serum (STI) (GEBH8 6, GEBH90) . The mature light chain 

10 consists of a 21 residue N- terminal sequence, glycosylated at 
SERio, followed by two tandem Kunitz-type domains the first of 
which is glycosylated at ASN 45 (ODOM90) . In the human protein, 
the second Kunitz-type domain has been shown to inhibit 
trypsin, chymotrypsin, and plasmin (ALBR83a, ALBR83b, SELL87, 

15 SWAI88) . The first domain lacks these activities but has been 
reported to inhibit leukocyte elastase (10" 6 > Ki > 10~ 9 ) 
(ALBR83a, b, ODOM90) . cDNA encoding the ITI light chain also 
codes for o:-l- microglobulin (TRAB86, KAUM8 6 , DIAR90) ; the 
proteins are separated post- translationally by proteolysis. 

20 The N- terminal Kunitz-type of the ITI light chain (ITI- 

Dl , comprising residues 22 to 76 of the UTI sequence shown in 
Fig. 1 of GEBH86) possesses a number of characteristics that 
make it useful as an IPBD. The domain is highly homologous to 
both BPTI and the EpiNE series of proteins described elsewhere 

25 in the present application. Although an x-ray structure of 

the isolated domain is not available, crystallographic studies 
of the related Kunitz-type domain isolated from the 
Alzheimer 1 s amyloid S-protein (AASP) precursor show that this 
polypeptide assumes a crystal structure almost identical to 

30 that of BPTI (HYNE90) . Thus, it is likely that the solution 
structure of the isolated ITI- Dl polypeptide will be highly 
similar to the structures of BPTI and AASP. In this case, the 
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advantages described previously for use of BPTI as an IPBD 
apply to ITI-D1. ITI-D1 provides additional advantages as an 
IDBP for the development of specific anti-elastase inhibitory 
activity. First, this domain has been reported to inhibit 
5 both leukocyte elastase (ALBR83a,b, ODOM90) and Cathepsin-G 

(SWAI88, ODOM90) ; activities which BPTI lacks. Second, ITI-D1 
lacks affinity for the related serine proteases trypsin, 
chymotrypsin, and plasmin (ALBR83a,b, SWAI88) , an advantage 
for the development of specificity in inhibition. Finally, 
10 ITI-D1 is a human-derived polypeptide so derivatives are 
anticipated to show minimal antigenicity in clinical 
applications . 

2 . Construction of the display vector. 

For purposes of this discussion, numbering of the nucleic 
15 acid sequence for the ITI light chain gene is that of TRAB86 

and of the amino acid sequence is that shown for UTI in Fig. 1 
of GEBH86. DNA manipulations were conducted according to 
standard methods as described in SAMB89 and AUSU87. 

The protein sequence of human ITI-D1 consists of 56 amino 

2 0 acid residues extending from LYS22 to ARG 77 of the complete ITI 

light chain sequence. This sequence is encoded by the 168 
bases between positions 750 and 917 in the cDNA sequence 
presented in TRAB86. The majority of the domain is contained 
between a Bgl l site spanning bases 663 to 773 and a PstI site 
25 spanning bases 903 to 908. The insertion of the ITI-D1 

sequence into M13 gene III was conducted in two steps. First 
a linker containing the appropriate ITI sequences outside the 
central Bgl l to Pst I region was ligated into the Nar l site of 
phage MA RF DNA. In the second step, the remainder of the 

3 0 ITI-D1 sequence was incorporated into the linker-bearing phage 

RF DNA. 
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The linker DNA consisted of two synthetic 
oligonucleotides (top and bottom strands) which, when 
annealed, produced a 54 bp double -stranded fragment with the 
following structure (5' to 3 1 ): 
5 NAR I OVERHANG/ ITI- 5 » / BGL I/STUFFER/ PST I/ITI -3 1 / NAR I 

OVERHANG 

The Narl OVERHANG sequences provide compatible ends for 
ligation into a cut Nar l site. The ITI-5' sequence consists 
of ds DNA corresponding to the thirteen positions from A750 to 

10 T662 immediately 5 f adjacent to the Bgll site in the ITI-D1 
sequence. Two changes, both silent, are introduced in this 
sequence: T to C at position 658 (changes codon for ASP 24 from 
GAT to GAC) and G to T at position 661 (changes codon for SER 25 
from TCG to TCT) . The sequences BGLI and PSTI are identical 

15 to the Bgl l and Pst I sites, respectively, in the ITI-D1 

sequence. The ITI-3' sequence consists of dsDNA corresponding 
to the nine positions from A909 to T917 immediately 3' 
adjacent to the Pst I site in the ITI-D1 sequence. The one 
base change included in this sequence, A to T at position 917, 

2 0 is silent and changes the codon for ARG 77 from CGA to CGT. The 
STUFFER sequence consists of dsDNA encoding three residues (5' 
to 3'): LEU (TTA) , TRP (TGG) , and SER(TCA). The reverse 
complement of the STUFFER sequence encodes two translation 
termination codons (TGA and TAA) . Phage expressing gene III 

2 5 containing the linker in opposite orientation to that shown 

above will not produce a functional gene III product. 

Phage MA RF DNA was digested with Nar l and the linear ca . 
8.2 kb fragment was gel purified and subse quently 
dephosphorylated using HK phosphatase (Epicentre) . The linker 

3 0 oligonucleotides were annealed to form the linker fragment 

described above, which was then kinased using T4 
Polynucleotide Kinase. The kinased linker was ligated to the 
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Narl-digested MA RF DNA in a 10:1 (linker :RF) molar ratio. 
After 18 hrs at 16 °C, the ligation was stopped by incubation 
at 65 °C for 10 min and the ligation products were ethanol 
precipitated in the presence of 10 /xg of yeast tRNA. The 
5 dried precipitate was dissolved in 5 fxl of water and used to 
transform D1210 cells by electroporation . After 60 min of 
growth in SOC at 37°C, transformed cells were plated onto LB 
plates supplemented with ampicillin (Ap, 200 /ig/ml) . RF DNA 
prepared from AP r isolates was subjected to restriction enzyme 

10 analysis. The DNA sequences of the linker insert and the 
immediately surrounding regions were confirmed by DNA 
sequencing. Phage strains containing the ITI Linker sequence 
inserted into the Nar l site in gene III are called MA-IL. 

Phage MA-IL RF DNA was partially digested with Bgll and 

15 the ca . 8.2 kb linear fragment was gel purified. This 

fragment was digested with PstI and the large linear fragment 
was gel purified. The Bgl l to Pst I fragment of ITI-D1 was 
isolated from pMGIA (a plasmid carrying the sequence shown in 
TRAB86) . pMGIA was digested to completion with Bgl l and the 

20 ca . 1.6 kb fragment was isolated by agarose gel 

electrophoresis and subsequent Geneclean (BiolOl, La Jolla, 
CA) purification. The purified Bgl l fragment was digested to 
completion with Pst I and EcoRI and the resulting mixture of 
fragments was used in a ligation with the Bgl l and Pst I cut 

25 MA-IL RF DNA described above. Ligation, transformation, and 

plating were as described above. After 18 hr. of growth on LB 
Ap plates at 3 7 °C, Ap r colonies were harvested with LB broth 
supplemented with Ap (200 jxg/ml) and the resulting cell 
suspension was grown for two hours at 37 °C. Cells were 

30 pelleted by centrif ugat ion (10 min at 5000xg, 4°C) . The 

supernatant fluid was transferred to sterile centrif ugation 
tubes and recentrif uged as above. The supernatant fluid from 
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the second centrif ugation step was retained as the phage stock 
POP1. 

PCR was used to demonstrate the presence of phage 
containing the complete ITI-D1-III fusion gene. Upstream PCR 
5 primers , 1UP and 2 UP, are located spanning nucleotides 14 70 to 
1494 and 1593 to 1618 of the phage M13 DNA sequence, 
respectively. A downstream PCR primer 3DN spans nucleotides 
1779 to 1804. Two ITI-D1- specific primers, IAI-1 and IAI-2, 
are located spanning positions 789 to 810 and 894 to 914, 

10 respectively, in the ITI light chain sequence of TRAB86. IAI- 
1 and IAI- 2 are used as downstream primers in PCR reactions 
with 1UP or 2UP. IAI-1 is entirely contained within the Bgl l 
to PstI region of the ITI-D1 sequence, while IAI-2 spans the 
PstI site in the ITI-D1 sequence. When aliquots of P0P1 phage 

15 were used as substrates for PCR, template-specific products of 
characteristic size were produced in reactions containing 1UP 
or 2UP plus IAI-1 or IAI-2 primer pairs. No such products are 
obtained using MA- IL phage as template. No PCR products with 
sizes corresponding to complete ITI-Dl-gene III templates were 

2 0 obtained using POP1 phage and the 1UP or 2 UP plus 3DN primer 
pairs. This last result reflects the low abundance (<1%) of 
phage containing the complete ITI-D1 sequence in POP1 . 

Preparative PCR was used to generate substrate amounts of 
the 33 0 bp PCR product of a reaction using the 1UP and IAI-2 

25 primer pair to amplify the POP1 template. The 330 bp PCR 

product was gel purified and then cut to completion with Bgl l 
and Pst I . The 138 bp Bgl l to Pst I fragment from ITI-D1 was 
isolated by agarose gel electrophoresis followed by Qiaex 
extraction (Qiagen, Studio City, CA) . MA- IL phage RF DNA was 

30 digested to completion with PstI. The ca . 8.2 kb linear 
fragment was gel purified and subsequently digested to 
completion with Bgl l . The Bgl l digest was extracted once with 
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phenol : chloroform (1:1), the aqueous phase was ethanol 
precipitated, and the pellet was dissolved in TE (pH8.0). An 
aliquot of this solution was used in a ligation reaction with 
the 13 8 bp Bgll to PstI fragment as described above. The 
5 ethanol precipitated ligation products were used to transform 
XLl-Blue ( TM) cells by electroporat ion and after 1 hr growth in 
SOC at 37 °C, cells were plated on LB Ap plates. A phage 
population, POP2 , was prepared from Ap r colonies as described 
previously. 

10 Phage stocks obtained from individual plaques produced on 

titration of POP2 were tested by PCR for the presence of the 
complete ITI-D1-III gene fusion. PCR results indicate the 
entire fusion gene was present in seven of nine isolates 
tested. RF DNA from the seven isolates testing positive was 

15 subjected to restriction enzyme analysis. The complete 

sequence of the ITI-D1 insertion into gene III was confirmed 
in four of the seven isolates by DNA sequence analysis. Phage 
isolates containing the ITI-D1-III fusion gene are called MA- 
ITI . 

2 0 3 . Expression and display of ITI-DI. 

Expression of the ITI domain I -Gene III fusion protein 
and its display on the surface of phage were demonstrated by 
Western analysis and phage titer neutralization experiments. 
For Western analysis, aliquots of PEG-purified phage 

25 preparations containing up to 4-10 10 infective particles were 
subjected to electrophoresis on a 12.5% SDS-urea- 
polyacrylamide gel . Proteins were transferred to a sheet of 
Immobilon-P transfer membrane (Millipore, Bedford, MA) by 
electrotransf er . Western blots were developed using a rabbit 

30 anti-ITI serum (SALI87) which had previously been incubated 
with an E^ coli extract, followed by goat anti-rabbit IgG 
conjugated to horse radish peroxidase (#401315, Calbiochem, La 



297 

Jolla # Ca) . An immunoreact ive protein with an apparent size 
of ca. 65-69 kD is detected in preparations of MA-ITI phage 
but not with preparations of the parental MA phage. The size 
of the immunoreact ive protein is consistent with the expected 
5 size of the processed ITI-DI-III fusion protein ( ca . 67 kD, as 
previously observed for the BPTI-III fusion protein) . 

Rabbit anti-BPTI serum has been shown to block the 
ability of MK-BPTI phage to infect coli cells (Example II) . 
To test for a similar effect of rabbit anti-ITI serum on the 

10 infectivity of MA-ITI phage, 10 ixl aliquot s of MA or MA-ITI 
phage were incubated in 100 ixl reactions containing 10 /il 
aliquots of PBS, normal rabbit serum (NRS) , or anti-ITI serum. 
After a three hour incubation at 37 °C, phage suspensions were 
titered to determine residual plaque -forming activity. These 

15 data are summarized in Table 211. Incubation of MA-ITI phage 
with rabbit anti-ITI serum reduces titers 10- to 100-fold, 
depending on initial phage titer. A much smaller decrease in 
phage titer (10 to 40%) is observed when MA-ITI phage are 
incubated with NRS. In contrast, the titer of the parental MA 

20 phage is unaffected by either NRS or anti-ITI serum. 

Taken together, the results of the Western analysis and 
the phage-titer neutralization experiments are consistent with 
the expression of an ITI-DI-III fusion protein in MA-ITI 
phage, but not in the parental MA phage, such that ITI- 

25 specific epitopes are present on the phage surface. The ITI- 
specific epitopes are located with respect to III such that 
antibody binding to these epitopes prevents phage from 
infecting coli cells. 

4 . Fractionation of MA-ITI phage bound to agarose - 
3 0 immobilized protease beads. 

To test if phage displaying the ITI-DI-III fusion protein 
interact strongly with the proteases human neutrophil elastase 
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(HNE) or cathepsin-G, aliquots of display phage were incubated 
with agarose -immobilized HNE or cathepsin-G beads (HNE beads 
or Cat-G beads, respectively) . The beads were washed and 
bound phage eluted by pH fractionation as described in 
5 Examples II and III. The procession in lowering pH during the 
elution was: pH 7.0, 6.0, 5.5, 5.0, 4.5, 4.0, 3.5, 3.0, 2.5, 
and 2.0. Following elution and neutralization, the various 
input, wash, and pH elution fractions were titered. 

The results of several fractionations are summarized in 

10 Table 212 (EpiNE-7 or MA-ITI phage bound to HNE beads) and 

Table 213 (EpiC-10 or MA-ITI phage bound to Cat-G beads) . For 
the two types of beads (HNE or Cat-G) , the pH elution profiles 
obtained using the control display phage (EpiNE-7 or EpiC-10, 
respectively) were similar to those seen previously (Examples 

15 II and III) . About 0.3% of the EpiNE-7 display phage applied 
to the HNE beads were eluted during the fractionation 
procedure and the elution profile had a maximum for elution at 
about pH 4.0. A smaller fraction, 0.02%, of the EpiC-10 phage 
applied to the Cat-G beads were eluted and the elution profile 

20 displayed a maximum near pH 5.5. 

The MA-ITI phage show no evidence of great affinity for 
either HNE or cathepsin-G immobilized on agarose beads. The 
pH elution profiles for MA-ITI phage bound to HNE or Cat-G 
beads show essentially monotonic decreases in phage recovered 

25 with decreasing pH. Further, the total fractions of the phage 
applied to the beads that were recovered during the 
fractionation procedures were quite low: 0.002% from HNE beads 
and 0.003% from Cat-G beads. 

Published values of Ki for inhibition neutrophil elastase 

30 by the intact, large (M r =240,000) ITI protein range between 60 
and 150 nM and values between 2 0 and 6000 nM have been 
reported for the inhibition of Cathepsin G by ITI (SWAI88, 
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ODOM90) . Our own measurements of pH fraction of display phage 
bound to HNE beads show that phage displaying proteins with 
low affinity (>/xM) for HNE are not bound by the beads while 
phage displaying proteins with greater affinity (nM) bind to 
5 the beads and are eluted at about pH 5. If the first Kunitz- 
type domain ot the ITI light chain is entirely responsible for 
the inhibitory activity of ITI against HNE, and if this domain 
is correctly displayed on the MA- ITI phage, then it appears 
that the minimum affinity of an inhibitor for HNE that allows 
10 binding and fractionation of display phage on HNE beads is 50 
to 100 nM. 

5 . Alteration of the PI region of ITI -PI. 

If ITI-DI and EpiNE-7 assume the same configuration in 
solution as BPTI , then these two polypeptides have identical 

15 amino acid sequences in both the primary and secondary binding 
loops with the exception of four residues about the PI 
position. For ITI-DI the sequence for positions 15 to 20 is 
(position 15 in ITI-DI corresponds to position 36 in the UTI 
sequence of GEBH86) : 

20 MET15, GLY16, MET17 , THR18, SER19, ARG2 0. In EpiNE-7 the 
equivalent sequence is: VAL15, ALA16, MET17, PHE18, PRQ19, 
ARG20. These two proteins appear to differ greatly in their 
affinities for HNE. To improve the affinity of ITI-DI for 
HNE, the EpiNE-7 sequence shown above was incorporated into 

25 the ITI-DI sequence at positions 15 through 20. 

The EpiNE-7 sequence was incorporated into the ITI-DI 
sequence in MA- ITI by cassette mutagenesis. The mutagenic 
cassette consisted of two synthetic 51 base oligonucleotides 
(top and bottom stands) which were annealed to make double 

3 0 stranded DNA containing an Eag I overhang at the 5 ! end and a 
Sty I overhang at the 3' end. The DNA sequence between the 
Eag I and Sty I overhangs is identical to the ITI-DI sequence 
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between these sites except at four codons : the codon for 
position 15, AT (MET) , was changed to GTC (VAL) , the codon for 
position 16, GGA (GLY) , was changed to GCT (ALA), the codon 
for position 18, ACC (THR) was changed to TTC (PHE) , and the 
5 codon for position 19, AGC (SER) , was changed to CCA (PRO) . 
MA-ITI RF DNA was digested with Eag I and Sty I. The large, 
linear fragment was gel purified and used in a ligation with 
the mutagenic cassette described above. Ligation products 
were used to transform XL1-Blue tm cells as described 

10 previously. Phage stocks obtained from overnight cultures of 

Ap r transductants were screened by PCR for incorporation of the 
altered sequence and the changes in the codons for positions 
15, 16, 18, and 19 were confirmed by DNA sequencing. Phage 
isolates containing the ITI-DI-III fusion gene with the EpiNE- 

15 7 changes around the PI position are called MA-ITI-E7. 
6 . Fractionation of MA-ITI-E7 phage. 

To test if the changes at positions 15, 16, 18, and 19 of 
the ITI-DI-III fusion protein influence binding of display 
phage to HNE beads, abbreviated pH elution profiles were 

2 0 measured. Aliquot s of EpiNE-7, MA-ITI, and MA-ITI-E7 display 
phage were incubated with HNE beads for three hours at room 
temperature. The beads were washed and phage were eluted as 
described (Example III) , except that only three pH elutions 
were performed: pH 7.0, 3.5, and 2.0. The results of these 

25 elutions are shown in Table 214. 

Binding and elution of the EpiNE-7 and MA-ITI display 
phage were found to be as previously described. The total 
fraction of input phages was high (0.4%) for EpiNE-7 phage and 
low (0.001%) for MA-ITI phage. Further, the EpiNE-7 phage 

30 showed maximum phage elution in the pH 3.5 fraction while the 
MA-ITI phage showed only a monotonic decrease in phage yields 
with decreasing pH, as seen above. 



The two strains of MA-ITI-E7 phage show increased levels 
of binding to HNE beads relative to MA-ITI phage. The total 
fraction of the input phage eluted from the beads is 10 -fold 
greater for both MA-ITI-E7 phage strains than for MA-ITI phage 
(although still 40- fold lower that EpiNE-7 phage) . Further, 
the pH elution profiles of the MA-ITI-E7 phage strains show 
maximum elutions in the pH 3.5 fractions, similar to EpiNE-7 
phage . 

To further define the binding properties of MA- ITI-E7 
phage, the extended pH fractionation procedure described 
previously was performed using phage bound to HNE beads. 
These data are summarized in Table 215. The pH elution 
profile of EpiNE-7 display phage is as previously described. 
In this more resolved, pH elution profile, MA-ITI -E7 phage 
show a broad elution maximum centered around pH 5. Once 
again, the total fraction of MA-ITI-E7 phage obtained on pH 
elution from HNE beads was about 40 -fold less than that 
obtained using EpiNE-7 display phage. 

The pH elution behavior of MA-ITI-E7 phage bound to HNE 
beads is qualitatively similar to that seen using BPTI [K15L] - 
III-MA phage. 'BPTI with the K15L mutation has an affinity for 
HNE of ~3.-10" 9 M. Assuming all else remains the same, the pH 
elution profile for MA-ITI -E7 suggests that the affinity of 
the free ITI- DI-E7 domain for HNE might be in the nM range. 
If this is the case, the substitution of the EpiNE-7 sequence 
in place of the ITI-DI sequence around the PI region has 
produced a 20- to 50 -fold increase in affinity for HNE 
(assuming K± = 60 to 150 nM for the unaltered ITI- DI) . 

If EpiNE-7 and ITI-DI-E7 have the same solution 
structure, these proteins present the identical amino acid 
sequences to HNE over the interaction surface. Despite this 
similarity, EpiNE-7 exhibits a roughly 1000-fold greater 
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affinity for HNE than does ITI-DI -E7. Again assuming similar 
structure, this observation highlights the importance of non- 
contacting secondary residues in modulating interaction 
strengths . 

5 Native IT1 light chain is glycosylated at two positions, 

SER10 and ASN45 (GEBH8 6 ) . Removal of the glycosaminoglycan 
chains has been shown to decrease the affinity of the 
inhibitor for HNE about 5-fold (SELL87) . Another potentially 
important difference between EpiNE-7 and ITI-DI-E7 is that of 

10 net charge. The changes in BPTI that produce EpiNE-7 reduce 
the total charge on the molecule from +6 to +1. Sequence 
differences between EpiNE-7 and ITI-DI-E7 further reduce the 
charge on the latter to -1. Furthermore, the change in net 
charge between these two molecules arises from sequence 

15 differences occurring in the central portions of the 

molecules. Position 26 is LYS in EpiNE-7 and is THR in ITI- 
DI-E7, while at position 31 these residues are GLN and GLU, 
respectively. These changes in sequence not only alter the 
net charge on the molecules but also position negatively 

20 charged residue close to the interaction surface in ITI-DI-E7. 
It may be that the occurrence of a negative charge at position 
31 (which is not found in any other of the HNE inhibitors 
described here) destabilized the inhibitor- protease 
interaction. 

25 

EXAMPLE X 

GENERATION OF A VARIEGATED ITI-DI POPULATION 

The following is a hypothetical example demonstating how 
to obtain a derivative of ITI having high affinity for HNE. 
3 0 The results of Example IX demonstrate that the nature of 

the protein sequence around the PI position in ITI-DI can 
significantly influence the strength of the interaction 



303 

between ITI-DI and HNE . While incorporation of the EpiNE-7 
sequence increases the affinity of ITI-DI for HNE, it is 
unlikely that this particular sequence is optimal for binding. 
We generate a large population of potential binding 
5 proteins having differing sequences in the PI region of ITI-DI 
using the oligonucleotide ITIMUT. ITIMUT is designed to 
incorporate variegation in ITI-DI at the six positions about 
and including the PI residue: 13, 15, 16, 17, 18, and 19. 
ITIMUT is synthesized as one long (top strand) 73 base 

10 oligonucleotide and one shorter (24 base) bottom strand 
oligonucleotide. The top strand sequence extends from 
position 770 (G) to position 842 (G) in the sequence of 
TREB86. This sequence includes the codons for the positions 
of variegation as well as the recognition sequences for the 

15 flanking restriction enzymes Eag I (778 to 783) and Sty I (829 
to 834) . The bottom strand oligonucleotide comprises the 
complement of the sequence from positions 819 to 842. 

To generate the mutagenic cassette, the top and bottom 
strand oligonucleotides are annealed and the resulting duplex 

20 is completed in an extension reaction using DNA polymerase. 
Following digestion of the 73 bp dsDNA with Eag I and Sty I, 
the purified 51 bp mutagenic cassette is ligated with the 
large linear fragment obtained from a similar digestion of MA- 
ITI RF DNA. Ligation products are used to transform competent 

25 cells by electroporation and phage stocks produced from Ap r 
transductants are analyzed for the presence and nature of 
novel sequences as described previously. 

The variegation in the ITIMUT cassette is confined to the 
codons for the six positions in ITI-DI (13, 15, 16, 17, 18, 

30 and 19), and employs three different nucleotide mixes: N, R, 
and S. For this mutagenesis, the composition of the N-mix is 
36%A, 17%C, 23%G, and 24%T, and corresponds to the N-mix 
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composition in the optimized NNS codon described elsewhere. 
The R-mix composition is 50%A, 50%G, and the S-mix composition 
is 50%C, 50%G. 

The codon for ITI-DI position 13 (CCC, PRO) is changed to 
5 SNG in ITIMUT. This codon encodes the eight residues PRO, 
VAL, GLU, ALA, GLY, LEU, GLN, and ARG. The encoded group 
includes the parental residue (PRO) as well as the more 
commonly observed variants at the position, ARG and LEU (see 
Table 15) , and also provides for the occurrence of acidic 
10 (GLU) , large polar (GLN) and nonpolar (VAL) , and small (ALA, 
GLY) residues. 

The codons for positions 15 and 17 (ATG, MET) are changed 
to the optimized NNS codon. All 2 0 natural amino acid 
residues and a translation termination are allowed. 

15 The codon for position 16 (CGA, GLY) is changed to RNS in 

ITIMUT. This codon encodes the twelve amino acids GLY, ALA, 
ASP, GLU, VAL, MET, ILE, THR, SER, ARG, ASN, and LYS . The 
encoded group includes the most commonly observed residues at 
this position, ALA and GLY, and provides for the occurrence of 

20 both positively (ARG, LYS) and negatively (GLU, ASP) charged 
amino acids. Large nonpolar residues are also included (ILE, 
MET, VAL) . 

Finally, at positions 18 and 19, the ITI-DI sequence is 
changed from ACC'AGC (THR'SER) to NNT'NNT . The NNT codon 
25 encodes the fifteen amino acid residues PHE, SER, TYR, CYS, 
LEU, PRO, HIS, ARG, ILE, THR, ASN, VAL, ALA, ASP, and GLY. 
This group includes the parental residues and the further 
advantages of the NNT codon have been discussed elsewhere. 
The ITIMUT DNA sequence encodes a total of: 
30 8 * 20 * 12 * 20 * 15 * 15 = 8,640,000 

different protein sequences in a total of: 
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different DNA sequences. The total number of protein 
sequences encoded by ITIMUT is only 7.4-fold fewer than the 
total possible number of natural sequences obtained from 
5 variation at six positions (= 20 s = 6.4-10 7 ). However, this 
degree of variation in protein sequence is obtained from a 
minimum of 1.07xl0 9 (NNS 6 = 2 30 ) DNA sequences, a 32 -fold 
greater number than that comprising ITIMUT. Thus, ITIMUT is 
an efficient vehicle for the generation of a large and diverse 
10 population of potential binding proteins. 
EXAMPLE XI 

DEVELOPMENT AND SELECTION OF BPTI MUTANTS FOR 

BINDING TO HORSE HEART MYOGLOBIN (HHMB) 

The following example is hypothetical and illustrates 
15 alternative embodiments of the invention not given in other 
examples . 

HHMb is chosen as a typical protein target; any other 
protein could be used. HHMb satisfies all of the criteria for 
a target: 1) it is large enough to be applied to an affinity 
20 matrix, 2) after attachment it is not reactive, and 3) after 
attachment there is sufficient unaltered surface to allow 
specific binding by PBDs . 

The essential information for HHMb is known: 1) HHMb is 
stable at least up to 70 °C, between pH 4.4 and 9.3, 2) HHMb is 

2 5 stable up to 1 . 6 M Guanidinium CI, 3) the pi of HHMb is 7.0, 

4) for HHMb, M r = 16,000, 5) HHMb requires haem, 6) HHMb has no 
proteolytic activity . 

In addition, the following information about HHMb and 
other myoglobins is available: 1) the sequence of HHMb is 

3 0 known, 2) the 3D structure of sperm whale myo globin is known; 

HHMb has 19 amino acid differences and it is generally assumed 
that the 3D structures are almost identical, 3) HHMb has no 
enzymatic activity, 4) HHMb is not toxic. 
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We set the specifications of an SBD as : 
1) T = 25°C; 2) pH = 8 . 0 ; 3) Acceptable solutes ((A) for 
binding: i) phosphate, as buffer, 0 to 2 0 mM, and ii) KC1 , 10 
mM; (B) for column elution: i) phosphate, as buffer, 0 to 30 
5 mM, ii) KCl , up to 5 M, and iii) Guanidinium CI, up to 0.8 
M . ) ; 4) Acceptable Kd < 1.0- 
10" 8 M. 

As stated in Sec. III.B, the residues to be varied are 
picked, in part, through the use of interactive computer 

10 graphics to visualize the structures. In this example, all 

residue numbers refer to BPTI . We pick a set of residues that 
forms a surface such that all residues can contact one target 
molecule. Information that we refer to during the process of 
choosing residues to vary includes: 1) the 3D structure of 

15 BPTI, 2) solvent accessibility of each residue as computed by 
the method of Lee and Richards (LEEB71) , 3) a compilation of 
sequences of other proteins homologous to BPTI, and 4) 
knowledge of the structural nature of different amino acid 
types . 

20 Tables 16 and 34 indicate which residues of BPTI: a) have 

substantial surface exposure, and b) are known to tolerate 
other amino acids in other closely related proteins. We use 
interactive computer graphics to pick sets of eight to twenty 
residues that are exposed and variable and such that all 

2 5 members of one set can touch a molecule of the target material 

at one time. If BPTI has a small amino acid at a given 
residue, that amino acid may not be able to contact the target 
simultaneously with all the other residues in the interaction 
set, but a larger amino acid might well make contact. A 

3 0 charged amino acid might affect binding without making direct 

contact. In such cases, the residue should be included in the 
interaction set, with a notation that larger residues might be 
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useful. In a similar way, large amino acids near the 
geometric center of the interaction set may prevent residues 
on either side of the large central residue from making 
simultaneous contact. If a small amino acid, however, were 
5 substituted for the large amino acid, then the surface would 
become flatter and residues on either side could make 
simultaneous contact. Such a residue should be included in 
the interaction set with a notation that small amino acids may 
be useful . 

10 Table 3 5 was prepared from standard model parts and shows 

the maximum span between C s and the tip of each type of side 
group. Cfi is used because it is rigidly attached to the 
protein main-chain; rotation about the C a -Cg bond is the most 
important degree of freedom for determining the location of 

15 the side group. 

Table 34 indicates five surfaces that meet the given 
criteria. The first surface comprises the set of residues 
that actually contacts trypsin in the complex of trypsin with 
BPTI as reported in the Brookhaven Protein Data Bank entry 

20 "1TPA". This set is indicated by the number "1". The exposed 
surface of the residues in this set (taken from Table 16) 
totals 1148 A 2 . Although this is not strictly the area of 
contact between BPTI and trypsin, it is approximately the 
same . 

2 5 Other surfaces, numbered 2 to 5, were picked by first 

picking one exposed, variable residue and then picking 
neighboring residues until a surface was defined. The choice 
of sets of residues shown in Table 34 is in no way exhaustive 
or unique; other sets of variable, surface residues can be 

3 0 picked. Set #2 is shown in stereo view, Figure 14, including 

the a carbons of BPTI, the disulfide linkages, and the side 
groups of the set. We take the orientation of BPTI in Figure 
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14 as a standard orientation and hereinafter refer to K15 as 
being at the top of the molecule, while the carboxy and amino 
termini are at the bottom. 

Solvent accessibilities are useful, easily tabulated 
5 indicators of a residue 1 s exposure. Solvent accessibilities 
must be used with some caution; small amino acids are under- 
represented and large amino acids over-represented. The user 
must consider what the solvent accessibility of a different 
amino acid would be when substituted into the structure of 
10 BPTI. 

To create specific binding between a derivative of BPTI 
and HHMb, we will vary the residues in set #2. This set 
includes the twelve principal residues 17 (R) , 19(1), 21 (Y) , 
27(A), 28(G), 29 (L), 31 (Q) , 32 (T) , 34 (V) , 48(A), 49(E), and 

15 52 (M) (Sec. III. B) . None of the residues in set #2 is 

completely conserved in the sample of sequences reported in 
Table 34; thus we can vary them with a high probability of 
retaining the underlying structure. Independent substitution 
at each of these twelve residues of the amino acid types 

20 observed at that residue would produce approximately 4.4- 10 9 
amino acid sequences and the same number of surfaces. 

BPTI is a very basic protein. This property has been 
used in isolating and purifying BPTI and its homologues so 
that the high frequency of arginine and lysine residues may 

25 reflect bias in isolation and is not necessarily required by 
the structure. Indeed, SCI -III from Bombyx mori contains 
seven more acidic than basic groups (SASA84) . 

Residue 17 is highly variable and fully exposed and can 
contain R, K, A, Y, H, F, L, M, T, G, Y, P, or S. All types 

3 0 of amino acids are seen: large, small, charged, neutral, and 

hydrophobic. That no acidic groups are observed may be due to 
bias in the sample. 
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Residue 19 is also variable and fully exposed, containing 
P, R, I, S, K, Q, and L. 

Residue 21 is not very variable, containing F or Y in 31 
of 33 cases and I and W in the remaining cases. The side 
5 group of Y21 fills the space between T3 2 and the main chain of 
residues 4 7 and 48. The OH at the tip of the Y side group 
projects into the solvent. Clearly one can vary the surface 
by substituting Y or F so that the surface is either 
hydrophobic or hydrophilic in that region. It is also 
10 possible that the other aromatic amino acid ( viz . H) or the 
other hydrophobics (L, M, or V) might be tolerated. 

Residue 27 most often contains A, but S, K, L, and T are 
also observed. On structural grounds, this residue will 
probably tolerate any hydrophilic amino acid and perhaps any 
15 amino acid. 

Residue 28 is G in BPTI . This residue is in a turn, but 
is not in a conformation peculiar to glycine. Six other types 
of amino acids have been observed at this residue: K, N, Q, R, 
H, and N. Small side groups at this residue might not contact 
20 HHMb simultaneously with residues 17 and 34. Large side 

groups could interact with HHMb at the same time as residues 
17 and 34. Charged side groups at this residue could affect 
binding of HHMb on the surface defined by the other residues 
of the principal set. Any amino acid, except perhaps P, 
25 should be tolerated. 

Residue 29 is highly variable, most often contain ing L. 
This fully exposed position will probably tolerate almost any 
amino acid except, perhaps, P. 

Residues 31, 32, and 34 are highly variable, exposed, and 
3 0 in extended conformations; any amino acid should be tolerated. 

Residues 48 and 49 are also highly variable and fully 
exposed, any amino acid should be tolerated. 
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Residue 52 is in an a helix. Any amino acid, except 
perhaps P, might be tolerated. 

Now we consider possible variation of the secondary set 
(Sec. 13.1.2) of residues that are in the neighborhood of the 
5 principal set . Neighboring residues that might be varied at 
later stages include 9<P), 11 (T) , 15 (K) , 16(A), 18(1), 20 (R) , 
22(F), 24 (N), 26 (K), 35 (Y) , 47 (S) , 50(D), and 53 (R) . 

Residue 9 is highly variable, extended, and exposed. 
Residue 9 and residues 4 8 and 4 9 are separated by a bulge 
10 caused by the ascending chain from residue 31 to 34. For 

residue 9 and residues 4 8 and 4 9 to contribute simultaneously 
to binding, either the target must have a groove into which 
the chain from 31 to 34 can fit, or all three residues (9, 48, 
and 49) must have large amino acids that effectively reduce 
15 the radius of curvature of the BPT1 derivative. 

Residue 11 is highly variable, extended, and exposed. 
Residue 11, like residue 9, is slightly far from the surface 
defined by the principal residues and will contribute to 
binding in the same circumstances. 
2 0 Residue 15 is highly varied. The side group of residue 

15 points away form the face defined by set #2. Changes of 
charge at residue 15 could affect binding on the surface 
defined by residue set #2 . 

Residue 16 is varied but points away from the surface 
25 defined by the principal set. Changes in charge at this 

residue could affect binding on the face defined by set #2 . 

Residue 18 is I in BPTI . This residue is in an extended 
conformation and is exposed. Five other amino acids have been 
observed at this residue: M, F, L, V, and T. Only T is 
30 hydrophilic. The side group points directly away from the 
surface defined by residue set #2. Substitution of charged 
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amino acids at this residue could affect binding at surface 
defined by residue set #2 . 

Residue 20 is R in BPTI . This residue is in an extended 
conformation and is exposed. Four other amino acids have been 
5 observed at this residue: A, S, L, and Q. The side group 

points directly away from the surface defined by residue set 
#2. Alteration of the charge at this residue could affect 
binding at surface defined by residue set #2 . 

Residue 22 is only slightly varied, being Y, F, or H in 

10 30 of 33 cases. Nevertheless, A, N, and S have been observed 
at this residue. Amino acids such as L, M, I, or Q could be 
tried here. Alterations at residue 22 may affect the mobility 
of residue 21; changes in charge at residue 22 could affect 
binding at the surface defined by residue set #2 . 

15 Residue 24 shows some variation, but probably can not 

interact with one molecule of the target simul taneously with 
all the residues in the principal set. Variation in charge at 
this residue might have an effect on binding at the surface 
defined by the principal set. 

2 0 Residue 2 6 is highly varied and exposed. Changes in 

charge may affect binding at the surface defined by residue 
set #2; substitutions may affect the mobility of residue 27 
that is in the principal set. 

Residue 3 5 is most often Y, W has been observed. The 
25 side group of 35 is buried, but substitution of F or W could 
affect the mobility of residue 34. 

Residue 4 7 is always T or S in the sequence sample used. 
The Ogamma probably accepts a hydrogen bond from the NH of 
residue 50 in the alpha helix. Nevertheless, there is no 

3 0 overwhelming steric reason to preclude other amino acid types 

at this residue. In particular, other amino acids the side 
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groups of which can accept hydrogen bonds, viz . N, D, Q, and 
E, may be acceptable here. 

Residue 50 is often an acidic amino acid, but other amino 
acids are possible. 
5 Residue 53 is often R, but other amino acids have been 

observed at this residue. Changes of charge may affect 
binding to the amino acids in interaction set #2 . 

Stereo Figure 14 shows the residues in set #2, plus R3 9. 
From Figure 14, one can see that R3 9 is on the opposite side 

10 of BPTI form the surface defined by the residues in set #2. 
Therefore, variation at residue 3 9 at the same time as 
variation of some residues in set #2 is much less likely to 
improve binding that occurs along surface #2 than is variation 
of the other residues in set #2 . 

15 In addition to the twelve principal residues and 13 

secondary residues, there are two other residues, 30(C) and 
33 (F) , involved in surface #2 that we will probably not vary, 
at least not until late in the procedure. These residues have 
their side groups buried inside BPTI and are conserved. 

2 0 Changing these residues does not change the surface nearly so 

much as does changing residues in the principal set. These 
buried, conserved residues do, however, contribute to the 
surface area of surface #2 . The surface of residue set #2 is 
comparable to the area of the trypsin-binding surface. 
25 Principal residues 17, 19, 21, 27, 28, 29, 31, 32, 34, 48, 49, 
and 52 have a combined solvent- accessible area of 94 6.9 A 2 . 
Secondary residues 9, 11, 15, 16, 18, 20, 22, 24, 26, 35, 47, 
50, and 53 have combined surface of 1041.7 A 2 . Residues 30 and 
33 have exposed surface totaling 38.2 A 2 . Thus the three 

3 0 groups 1 combined surface is 2026.8 A 2 . 

Residue 3 0 is C in BPTI and is conserved in all 
homologous sequences. It should be noted, however, that 
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C14/C38 is conserved in all natural sequences, yet Marks et 
al . (MARK87) showed that changing both C14 and C38 to A, A or 
T,T yields a functional trypsin inhibitor. Thus it is 
possible that BPTI-like molecules will fold if C30 is 
5 replaced. 

Residue 33 is F in BPTI and in all homologous sequences. 
Visual inspection of the BPTI structure suggests that 
substitution of Y, M, H, or L might be tolerated. 

Having identified twenty residues that define a possible 

10 binding surface, we must choose some to vary first. Assuming 
a hypothetical affinity separation sensitivity, C se nsi/ of 1 in 
4 -10 s , we decide to vary six residues (leaving some margin for 
error in the actual base composition of variegated bases) . To 
obtain maximal recognition, we choose residues from the 

15 principal set that are as far apart as possible. Table 36 

shows the distances between the 6 carbons of residues in the 
principal and peripheral set. R17 and V34 are at one end of 
the principal surface. Residues A2 7, G2 8, L2 9, A4 8, E4 9, and 
M52 are at the other end, about twenty Angstroms away; of 

20 these, we will vary residues 17, 27, 29, 34, and 48. Residues 
28, 49, and 52 will be varied at later rounds. 

Of the remaining principal residues, 21 is left to later 
variations. Among residues 19, 31, and 32, we arbitrarily 
pick 19 to vary. 

25 Unlimited variation of six residues produces 6.4 -10 7 amino 

acid sequences. By hypothesis, C sen si is 1 in 4-10 8 . Table 37 
shows the programmed variegation at the chosen residues. The 
parental sequence is present as 1 part in 5.5-10 7 , but the 
least favored sequences are present at only 1 part in 4.2-10 9 . 

30 Among single- amino-acid substitutions from the PPBD, the 

least favored is F17-I19-A27-L29-V34-A48 and has a calculated 
abundance of 1 part in 1.6-10 8 . Using the optimal qfk codon, 
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we can recover the parental sequence and all one-amino-acid 
substitutions to the PPBD if actual nt compositions come 
within 5% of programmed compositions. The number of 
transf ormants is M ntv = 1-0-10 9 (also by hypothesis), thus we 
5 will produce most of the programmed sequences. 

The residue numbers of the preceding section are referred 
to mature BPTI (R1-P2- . . . -A58) . Table 25 has residue numbers 
referring to the pre-M13CP-BPTI protein; all mature BPTI 
sequence numbers have been increased by the length of the 

10 signal sequence, i.e. 23. Thus in terms of the pre-OSP-PBD 

residue numbers, we wish to vary residues 40, 42, 50, 52, 57, 
and 71. A DNA subsequence containing all these codons is 
found between the ( Apa l/ Dra ll/ Pss I ) sites at base 191 and the 
► Sph I site at base 3 09 of the osp-pbd gene. Among Apa l , Dral, 

15 and Pss I , Apa l is preferred because it recognizes six bases 
without any ambiguity. Dral I and Pss I , on the other hand, 
recognize six bases with two- fold ambiguity at two of the 
bases. The vgDNA will contain more Dra l I and Pss I recognition 
sites at the varied locations than it will contain Apal 

20 recognition sites. The unwanted extraneous cutting of the 

vgDNA by Apa l and SphI will eliminate a few sequences from our 
population. This is a minor problem, but by using the more 
specific enzyme (Apal), we minimize the unwanted effects. The 
sequence shown in Table 3 7 illustrates an additional way in 

25 which gratuitous restriction sites can be avoided in some 

cases. The osp-ipbd gene had the codon GGC for g51; because 
we are varying both residue 50 and 52, it is possible to 
obtain an Apa l site. If we change the glycine codon to GGT, 
the Apa l site can no longer arise. Apa l recognizes the DNA 

3 0 sequence (GGGCC/C) . 

Each piece of dsDNA to be synthesized needs six to eight 
bases added at either end to allow cutting with restriction 
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enzymes and is shown in Table 37. The first synthetic base 
(before cutting with Apa l and Sph I) is 184 and the last is 
322. There are 142 bases to be synthesized. The center of 
the piece to the synthesized lies between Q54 and V57. The 
5 overlap can not include varied bases, so we choose bases 245 
to 256 as the overlap that is 12 bases long. Note that the 
codon for F56 has been changed to TTC to increase the GC 
content of the overlap. The amino acids that are being varied 
are marked as X with a plus over them. Codons 57 and 71 are 

10 synthesized on the sense (bottom) strand. The design calls 
for "qfk" in the antisense strand, so that the sense strand 
contains (from 5' to 3 1 ) a) equal part C and A ( i.e. the 
complement of k) , b) (0.40 T, 0.22 A, 0.22 C, and 0.16 G) 
( i.e. the complement of f ) , and c) (0.26 T, 0.26 A, 0.30 C, 

15 and 0 . 18 G) . 

Each residue that is encoded by "qfk" has 21 possible 
outcomes, each of the amino acids plus stop. Table 12 gives 
the distribution of amino acids encoded by "qfk", assuming 5% 
errors. The abundance of the parental sequence is the product 

2 0 of the abundances of Rx I xAxLxVxA. The abundance of 
the least- favored sequence is 1 in 4.2 -10 9 . 

Olig#27 and olig#2 8 are annealed and extended with Klenow 
fragment and all four (nt)TPs. Both the ds synthetic DNA and 
RF pLG7 DNA are cut with both Apa l and SphI . The cut DNA is 

25 purified and the appropriate pieces ligated (See Sec. 14.1) 

and used to transform competent PE383. (Sec. 14.2) . In order 
to generate a sufficient number of transf ormants , V c is set to 
5000 ml . 

1) culture coli in 5 . 0 1 of LB broth at 37°C until cell 
30 density reaches 5-10 7 to 7-10 7 cells/ml, 

2) chill on ice for 65 minutes, centrifuge the cell 
suspension at 4000g for 5 minutes at 4°C, 
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3) discard supernatant; resuspend the cells in 1667 ml of an 
ice-cold, sterile solution of 60 mM CaCl 2 , 

4) chill on ice for 15 minutes, and then centrifuge at 4000g 
for 5 minutes at 4°C, 

5 5) discard supernatant; resuspend cells in 2 x 400 ml of 

ice-cold, sterile 60 mM CaCl 2 ; store cells at 4°C for 24 
hours , 

6) add DNA in ligation or TE buffer; mix and store on ice 
for 3 0 minutes; 2 0 ml of solution containing 5 /xg/ml of 

10 DNA is used, 

7) heat shock cells at 42 °C for 90 seconds, 

8) add 2 00 ml LB broth and incubate at 3 7 °C for 1 hour, 

9) add the culture to 2 . 0 1 of LB broth containing 
ampicillin at 35-100 jxg/xnl and culture for 2 hours at 

15 37°C, 

10) centrifuge at 8000 g for 20 minutes at 4°C, 

11) discard supernatant, resuspend cells in 50 ml of LB broth 
plus ampicillin and incubate 1 hour at 37°C, 

12) plate cells on LB agar containing ampicillin, 

2 0 13) harvest virions by method of Salivar et al . (SALI64) . 

The heat shock of step (7) can be done by dividing the 2 00 ml 
into 100 200 /xl aliquots in 1.5 ml plastic Eppendorf tubes. 
It is possible to optimize the heat shock for other volumes 
and kinds of container. It is important to: a) use all or 
25 nearly all the vgDNA synthesized in ligation, this will 

require large amounts of pLG7 backbone, b) use all or nearly 
all the ligation mixture to transform cells, and c) culture 
all or nearly all the transf ormants at high density. These 
measures are directed at maintaining diversity. 

3 0 IPTG is added to the growth medium at 2.0 mM (the optimal 

level) and virions are harvested in the usual way. It is 
important to collect virions in a way that samples all or 
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nearly all the transf ormants . Because F" cells are used in the 
transformation, multiple infections do not pose a problem. 

HHMb has a pi of 7 . 0 and we carry out chromatography at 
pH 8.0 so that HHMb is slightly negative while BPTI and most 
5 of its mutants are positive. HHMb is fixed (Sec. V.F) to a 

2.0 ml column on Affi- Gel 10 (TM) or Affi-Gel 15 (TM) at 4.0 mg/ml 
support matrix, the same density that is optimal for a column 
supporting trp. 

We note that charge repulsion between BPTI and HHMb 
10 should not be a serious problem and does not impose any 

constraints on ions or solutes allowed as eluants. Neither 
BPTI nor HHMb have special requirements that constrain choice 
of eluants. The eluant of choice is KC1 in varying 
concentrations . 

15 To remove variants of BPTI with strong, indiscriminate 

binding for any protein or for the support matrix, we pass the 
variegated population of virions over a column that supports 
bovine serum albumin (BSA) before loading the population onto 
the {HHMb} column. Affi-Gel 10 (TM) or Affi-Gel 15 (TM) is used to 

20 immobilize BSA at the highest level the matrix will support. 
A 10.0 ml column is loaded with 5.0 ml of 

Affi-Gel- linked-BSA; this column, called {BSA} , has V v = 5 . 0 
ml. The variegated population of virions containing 10 12 pfu 

2 5 in 1 ml (0.2 x V v ) of 10 mM KC1 , 1 mM phosphate, pH 8 . 0 buffer 
is applied to {BSA} . We wash {BSA} with 4 . 5 ml (0.9 x V v ) of 
50 mM KC1, 1 mM phosphate, pH 8.0 buffer. The wash with 50 mM 
salt will elute virions that adhere slightly to BSA but not 
virions with strong binding. The pooled effluent of the {BSA} 

30 column is 5.5 ml of approximately 13 mM KCl . 

The column {HHMb} is first blocked by treatment with 10 11 
virions of M13 (am429) in 100 ul of 10 mM KCl buffered to pH 
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8.0 with phosphate; the column is washed with the same buffer 
until OD 2 eo returns to base line or 2 x V v have passed through 
the column, whichever comes first. The pooled effluent from 
{BSA} is added to { HHMb } in 5.5 ml of 13 mM KC1, 1 mM 
5 phosphate, pH 8.0 buffer. The column is eluted in the 
following way: 

1) 10 mM KC1 buffered to pH 8.0 with phosphate, until 

optical density at 280nm falls to base line or 2 x V v , 
whichever is first, (effluent dis carded) , 
10 2) a gradient of 10 mM to 2 M KCl in 3 x V v , pH held at 8.0 

with phosphate, (30-100 /xl fractions), 

3) a gradient of 2 M to 5 M KCl in 3 x V v , phosphate buffer 
to pH 8.0 (30-100 /xl fractions), 

4) constant 5 M KCl plus 0 to 0 . 8 M guanidinium CI in 2 x V v , 
15 with phosphate buffer to pH 8.0, (20-100 Ail fractions), 

and 

5) constant 5 M KCl plus 0.8 M guanidinium CI in 1 x V V/ with 
phosphate buffer to pH 8.0, (10-100 /il fractions). 

In addition to the elution fractions, a sample is removed from 
20 the column and used as an inoculum for phage -sensitive Sup" 

cells (Sec. V) . A sample of 4 ^1 from each fraction is plated 
on phage-sensitive Sup" cells. Fractions that yield too many 
colonies to count are replated at lower dilution. An 
approximate titre of each fraction is calculated. Starting 
25 with the last fraction and working toward the first fraction 
that was titered, we pool fractions until approximately 10 9 
phage are in the pool, i.e. about 1 part in 1000 of the phage 
applied to the column. This population is infected into 3-10 11 
phage-sensitive PE384 in 300 ml of LB broth. The very low 
30 multiplicity of infection (moi) is chosen to reduce the 

possibility of multiple infection. After thirty minutes, 
viable phage have entered recipient cells but have not yet 
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begun to produce new phage. Phage-born genes are expressed at 
this phase, and we can add ampicillin that will kill 
uninfected cells. These cells still carry F-pili and will 
absorb phage helping to prevent multiple infec tions. 
5 If multiple infection should pose a problem that cannot 

be solved by growth at low multiple-of- infection on F + cells, 
the following procedure can be employed to obviate the 
problem. Virions obtained from the affinity separation are 
infected into F + coli and cultured to amplify the genetic 

10 messages (Sec. V). CCC DNA is obtained either by harvesting 
RF DNA or by in vitro extension of primers annealed to ss 
phage DNA. The CCC DNA is used to transform F" cells at a high 
ratio of cells to DNA. Individual virions obtained in this 
way should bear only proteins encoded by the DNA within. 

15 The phagemid population is grown and chromato graphed 

three times and then examined for SBDs (Sec. V). In each 
separation cycle, phage from the last three fractions that 
contain viable phage are pooled with phage obtained by 
removing some of the support matrix as an inoculum. At each 

2 0 cycle, about 10 12 phage are loaded onto the column and about 

10 9 phage are cultured for the next separation cycle. After 
the third separation cycle, SBD colonies are picked from the 
last fraction that contained viable phage. 

Each of the SBDs is cultured and tested for retention on 
25 a Pep-Tie column supporting HHMb. The phage showing the 

greatest retention on the Pep-Tie {HHMb} column. This SBD! 
becomes the parental amino- acid sequence to the second 
variegation cycle . 

Assume for the sake of argument that, in SBD!, R4 0 

3 0 changed to D, 142 changed to Q, A50 changed to E, L52 

remained L, and A71 changed to W (see Table 38) . If so, a 
rational plan for the second round of variegation would be 
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that which is se t forth in Table 39. The residues to be varied 
are chosen by: a) choosing some of the residues in the 
principal set that were not varied in the first round ( viz . 
residues 42, 44, 51, 54, 55, 72, or 75 of the fusion), and b) 
5 choosing some residues in the secondary set. Residues 51, 54, 
55, and 72 are varied through all twenty amino acids and, 
unavoidably, stop. Residue 44 is only varied between Y and F. 
Some residues in the secondary set are varied through a 
restricted range; primarily to allow dif ferent charges (+, 0, 

10 -) to appear. Residue 3 8 is varied through K, R, E, or G. 
Residue 41 is varied through I, V, K, or E. Residue 43 is 
varied through R, S, G, N, K, D, E, T, or A. 

Now assume that in the most successful SBD of the second 
round of variegation (SBD-21), residue 38 (K15 of BPTI) 

15 changed to E, 41 becomes V, 43 goes to N, 44 goes to F, 51 

goes to F, 54 goes to S, 55 goes to A, and 72 goes to Q (see 
Table 40) . A third round of variation is illustrated in Table 
41; eight amino acids are varied. Those in the principal set, 
residues 40, 55, and 57, are varied through all twenty amino 

2 0 acids. Residue 32 is varied through P, Q, T, K, A, or E. 

Residue 34 is varied through T, P, Q, K, A, or E. Residue 44 
is varied through F, L, Y, C, W, or stop. Residue 50 is 
varied through E, K, or Q. Residue 52 is varied through L, F, 
I, M, or V. The result of this variation is shown in Table 
25 42 . 

This example is hypothetical. It is anticipated that 
more variegation cycles will be needed to achieve dissociation 
constants of 10" 8 M. It is also possible that more than three 
separation cycles will be needed in some variegation cycles. Real 

3 0 DNA chemistry and DNA synthesizers may have larger errors than 

our hypothe tical 5%. If S er r > 0.05, then we may not be able 
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to vary six residues at once. Variation of 5 residues at once 
is certainly possible. 
EXAMPLE XII 

DESIGN AND MUTAGENESIS OF A CLASS 1 MINI -PROTEIN 

5 To obtain a library of binding domains that are 

conf ormationally constrained by a single disulfide, we insert 
DNA coding for the following family of mini -proteins into the 
gene coding for a suitable OSP. 

10 Xi-X 2 -C-X3-X4-X 5 -X 6 -C-X7-X 8 (SEQ ID NO:19)-- 

1 I 

Where 1 1 indicates disulfide bonding; this mini -protein 

is depicted in Figure 3. Disulfides normally do not form 

15 between cysteines that are consecutive on the polypeptide 

chain. One or more of the residues indicated above as X n will 
be varied extensively to obtain novel binding. There may be 
one or more amino acids that precede X x or follow X8 , however, 
these additional residues will not be significantly 

2 0 constrained by the diagrammed disulfide bridge, and it is less 
advantageous to vary these remote, unbridged residues. The 
last X residue is connected to the OSP of the genetic package. 

Xi, X 2 , X 3/ X 4 , X 5/ X 6 , X 7 , and X 8 can be varied 
independently; i.e. a different scheme of variegation could be 

25 used at each position. Xx and X 8 are the least constrained 
residues and may be varied less than other positions. 

X x and X 8 can be, for example, one of the amino acids [E, 
K, T, and A]; this set of amino acids is preferred because: a) 
the possibility of positively charged, negatively charged, and 

30 neutral amino acids is provided, b) these amino acids can be 
provided in 1:1:1:1 ratio via the codon RMG (R = equimolar A 
and G, M = equimolar A and C) , and c) these amino acids allow 
proper processing by signal peptidases. 
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One option for variegation of X 2/ X 3 , X 4 , X 5/ X 6 , and X 7 is 
to vary all of these in the same way. For example, each of X 2/ 
X 3/ X 4/ X 5/ X 6/ and X 7 can be chosen from the set [F, S, Y, C, 
L, P, H, R, I, T, N, V, A, D, and G] which is encoded by the 
5 mixed codon NNT. Tables 10 and 13 0 compares libraries in 

which six codons have been varied either by NNT or NNK codons . 

NNT encodes 15 different amino acids and only 16 DNA 
sequences. Thus, there are 1.13 9 • 10 7 amino-acid sequences, 
no stops, and only 1.678 • 10 7 DNA sequences. A library of 10 s 
10 independent transf ormants will contain 99% of all possible 

sequences. The NNK library contains 6.4 • 10 7 sequences, but 
complete sampling requires a much larger number of independent 

transf ormants . 

EXAMPLE XIII 
15 A CYS: : HELIX :: TURN: : STRAND : : CYS UNIT 

The parental Class 2 mini -proteins may be a naturally- 
occurring Class 2 mini -protein. It may also be a domain of a 
larger protein whose structure satisfies or may be modified so 
as to satisfy the criteria of a class 2 mini-protein. The 

20 modification may be a simple one, such as the introduction of 
a cysteine (or a pair of cysteines) into the base of a hairpin 
structure so that the hairpin may be closed off with a 
disulfide bond, or a more elaborate one, so as the 
modification of intermediate residues so as to achieve the 

25 hairpin structure. The parental class 2 mini-protein may also 
be a composite of structures from two or more naturally- 
occurring proteins, e.g. , an a helix of one protein and a 6 
strand of a second protein. 

One mini-protein motif of potential use comprises a 

30 disulfide loop enclosing a helix, a turn, and a return strand. 
Such a structure could be designed or it could be obtained 
from a protein of known 3D structure. Scorpion neurotoxin, 
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variant 3, (ALMA83a / ALMA83b) (hereafter ScorpTx) contains a 
structure diagrammed in Figure 15 that comprises a helix 
(residues N22 through N33) , a turn (residues 33 through 35) , 
and a return strand (residues 36 through 41) . ScorpTx 
5 contains disulfides that join residues 12-65, 16-41, 25-46, 

and 29-48. CYS 2 s and CYS 41 are quite close and could be joined 
by a disulfide without deranging the main chain. Figure 15 
shows CYS 2 5 joined to CYS41- In addition, CYS29 has been 
changed to GLN. It is expected that a disulfide will form 

10 between 2 5 and 41 and that the helix shown will form; we know 
that the amino-acid sequence shown is highly compatible with 
this structure. The presence of GLY 35/ GLY 36 , and GLY 39 give 
the turn and extended strand sufficient flexibility to 
accommodate any changes needed around CYS41 to form the 

15 disulfide. 

From examination of this structure (as found in entry 
1SN3 of the Brookhaven Protein Data Bank) , we see that the 



following 


sets of 


residues would be preferred for 


SET 


1 








Residue 


Codon 


Allowed amino acids 


Naa/Ndna 


1) 


T 2 7 


NNG 


L 2 ,R 2 ,M / V,S,P / T,A / 


13/15 








Q,K,E,W,G, . 




2) 


E28 


VHG 


L,M,V,P,T,A,G / K / E 


9/9 


3) 


A 3 1 


VHG 


L,M, V, P,T,A,G,K,E 


9/9 


4) 




VHG 


L,M,V,P,T,A,G,K,E 


9/9 


5) 


G24 


NNG 


L 2 ,R 2 ,M,V,S,P,T,A, 


13/15 








Q,K,E,W,G, . 




6) 


E23 


VHG 


L,M,V, P,T,A,G,K,E 


9/9 


7) 


Q34 


VAS 


H,Q,N,K,E,D 


6/6 



30 Note: Exponents on amino acids indicate multiplicity of 
codons . 

Positions 27, 28, 31, 32, 24, and 23 comprise one face of 
the helix. At each of these locations we have picked a 
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variegating codon that a) includes the parental amino acid, b) 
includes a set of residues having a predominance of helix 
favoring residues, c) provides for a wide variety of amino 
acids, and d) leads to as even a distribution as possible. 
5 Position 34 is part of a turn. The side group of residue 34 
could interact with molecules that contact the side groups of 
resideus 27, 28, 31, 32, 24, and 23. Thus we allow 
variegation here and provide amino acids that are compatible 
with turns. The variegation shown leads to 6.65-10 6 amino acid 
10 sequences encoded by 8 . 85 • 10 6 DNA sequences . 



15 



SET 2 
Residue 


Codon 


Allowed amino acids 


Naa/Ndna 


1) 


D 26 


VHS 


L 2 , I,M, V 2 , P 2 ,T 2 ,A 2 , 


13/18 








H,Q,N,K,D,E 




2) 


T27 


NNG 


L 2 ,R 2 ,M,V,S,P / T,A, 


13/15 








Q, K, E, W, G, . 




3) 


K30 


VHG 


K, E, Q, P, T, A, L, M, V 


9/9 


4) 


A31 


VHG 


K,E,Q, P, T, A, L, M, V 


9/9 


5) 




VHG 


L,M, V,P,T,A,G,K,E 


9/9 


6) 


S37 


RRT 


S,N,D,G 


4/4 


7) 


Y 38 


NHT 


Y,S,F,H,P,L,N,T, I,D,A, 


V 9/9 




Positions 26, 


27, 30, 31, and 3 2 are 


variegated 



20 



enhance helix- favoring amino acids in the population. 
Residues 3 7 and 3 8 are in the return strand so that we pick 
25 different variegation codons . This variegation allows 4.43-10 6 
amino-acid sequences and 7.08-10 6 DNA sequences. Thus a 
library that embodies this scheme can be sampled very 
efficiently . 
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EXAMPLE XIV 

DESIGN AND MUTAGENESIS OF CLASS 3 MINI -PROTEIN 

Two Disulfide Bond Parental Mini -Proteins 

Mini -proteins with two disulfide bonds may be modelled 
5 after the a-conotoxins, e.g. , GI , GIA, Gil, MI, and SI. These 
have the following conserved structure (SEQ ID NOs:20-31) : 

12 1 1 2 ' 

(1-2 AAs)-C-C-(3 AAs)-C-(5 AAs)-C-(0-5 AAs) 

o H ' I 

I I 



Hashimoto et al . (HASH85) reported synthesis of twenty- 
four analogues of a conotoxins GI , GII, and MI. Using the 

15 numbering scheme for GI (CYS at positions 2, 3, 7, and 13), 

Hashimoto et al . reported alterations at 4 , 8, 10, and 12 that 
allows the proteins to be toxic. Almquist et al ■ (ALMQ89) 
synthesized [des-GLUi] a Conotoxin GI and twenty analogues. 
They found that substituting GLY for PRO s gave rise to two 

20 isomers, perhaps related to different disulfide bonding. They 
found a number of substitutions at residues 8 through 11 that 
allowed the protein to be toxic. Zafaralla et al . (ZAFA88) 
found that substituting PRO at position 9 gives an active 
protein. Each of the groups cited used only in vivo toxicity 

25 as an assay for the activity. From such studies, one can 

infer that an active protein has the parental 3D structure, 
but one can not infer that an inactive protein lacks the 
parental 3D structure . 

Pardi et al . (PARD8 9) determined the 3D structure of ot 

3 0 Conotoxin GI obtained from venom by NMR. Kobayashi et al . 

(KOBA89) have reported a 3D structure of synthetic a. Conotoxin 
GI from NMR data which agrees with that of PARD89. We refer 
to Figure 5 of Pardi et al . . 
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Residue GLUi is known to accomodate GLU, ARG, and ILE in 
known analogues or homologues. A preferred variegation codon 
is NNG that allows the set of amino acids 

[L 2 # R 2 ,M,V,S,P,T,A,Q,K,E,W,G, <stop>] . From Figure 5 of Pardi 
5 et al . we see that the side group of GLUi projects into the 
same region as the strand comprising residues 9 through 12 . 
Residues 2 and 3 are cysteines and are not to be varied. The 
side group of residue 4 points away from residues 9 through 
12; thus we defer varying this residue until a later round. 

10 PR0 5 may be needed to cause the correct disulfides to form; 
when GLY was substituted here the peptide folded into two 
forms, neither of which is toxic. It is allowed to vary PR0 5 , 
but not perf erred in the first round. 

No substitutions at ALA 6 have been reported. A preferred 

15 variegation codon is RMG which gives rise to ALA, THR, LYS, 

and GLU (small hydrophobic, small hydro phi lie, positive, and 
negative) . CYS 7 is not varied. We prefer to leave GLY 8 as is, 
although a homologous protein having ALA 8 is toxic. Homologous 
proteins having various amino acids at position 9 are toxic; 

2 0 thus, we. use an NNT variegation codon which allows 

F,S 2 ,Y,C,L,P,H # R, I,T,N,V,A,D,G. We use NNT at positions 10, 
11, and 12 as well. At position 14, following the fourth CYS, 
we allow ALA, THR, LYS, or GLU ( via an RMG codon) . This 
variegation allows 1.053 -10 7 anino-acid sequences, encoded by 
25 1.68 -10 7 DNA sequences. Libraries having 2.0-10 7 , 3.0-10 7 , and 
5.0-10 7 independent transf ormants will, respectively, display 
«70%, «83%, and «95% of the allowed sequences. Other 
variegations are also appropriate. Concerning of conotoxins, 
see, inter alia , ALMQ89, CRUZ 8 5 , GRAY83, GRAY84, and PARD89. 

3 0 The parental mini -protein may instead be one of the 

proteins designated "Hybrid- I" and "Hybrid- II" by Pease et al . 
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(PEAS90) ; cf . Figure 4 of PEAS 90 . One preferred set of 

residues to vary for either protein consists of: 

Parenta Variegated Allowed AA seqs/ 

Amino acid Codon Amino acids DNA seqs 

5 A5 RVT A f D,G,T # N,S 6/6 
P6 VYT P, T,A,L, I, V 6/6 

E7 RRS E,D,N,K, S,R,G 2 7/8 
T8 VHG T,P,A,L,M,V,Q,K,E 9/9 
A9 VHG A,T, P,L,M,V,Q,K,E 9/9 
10 A10 RMG A,E,K,T 4/4 
K12 VHG K^E^P^L^V 9/9 

Q16 NNG L 2 ,R 2 f S t VI , P , Q ,M , T , KV, A, E , G 13/15 

15 This provides 9.55 -10 6 amino-acid sequences encoded by 1.26 -10 7 
DNA sequences. A library comprising 5.0-10 7 transf ormants 
allows expression of 98.2% of all possible sequences. At each 
position, the parental amino acid is allowed. 
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At position 5 we provide amino acids that are compatible 
with a turn. At position 6 we allow ILE and VAL because they 
have branched 6 carbons and make the chain ridged. At 
5 position 7 we allow ASP, ASN, and SER that often appear at the 
amino termini of helices. At positions 8 and 9 we allow 
several helix- favoring amino acids (ALA, LEU, MET, GLN, GLU, 
and LYS) that have differing charges and hydrophobi cities 
because these are part of the helix proper. Position 10 is 

10 further around the edge of the helix, so we allow a smaller 
set (ALA, THR, LYS, and GLU) . This set not only includes 3 
helix- favoring amino acids plus THR that is well tolerated but 
also allows positive, negative, and neutral hydrophilic. The 
side groups of 12 and 16 project into the same region as the 

15 residues already recited. At these positions we allow a wide 
variety of amino acids with a bias toward helix- favoring amino 
acids . 

The parental mini -protein may instead be a polypeptide 
composed of residues 9-24 and 31-40 of aprotinin and 
20 possessing two disulfides (Cys9-Cys22 and Cysl4-Cys38) . Such 
a polypeptide would have the same disulfide bond topology as 
a-conotoxin, and its two bridges would have spans of 12 and 
17, respectively. 

Residues 23, 24 and 31 are variegated to encode the amino 
25 acid residue set [G, S , R, D, N, H, P, T, A] so that a sequence that 
favors a turn of the necessary geometry is found. We use 
trypsin or anhydrotrypsin as the affinity molucule to enrich 
for GPs that display a mini-protein that folds into a stable 
structure similar to BPTI in the PI region. 
3 0 Three Disulfide Bond Parental Mini -Proteins 

The cone snails (Conus) produce venoms (conotoxins) which 
are 10-30 amino acids in length and exceptionally rich in 
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disulfide bonds. They are therefore archetypal mini -proteins . 

Novel mini- proteins with three disulfide bonds may be 
modelled after the \l- (GIIIA, GIIIB, GIIIC) or Q- (GVTA, GVIB, 
GVIC, GVTIA, GVIIB, MVIIA, MVIIB, etc . ) conotoxins . The \i- 
5 conotoxins have the following conserved structure (SEQ ID 

NO: 32) : 



12 3 1 1 2 1 3 1 

(2 AAs)-C-C-(5 AAs)-C-(4 AAs) -C- (4 AAs) -C-C-AA 

10 H 1 1 | | 

1 1 1 I 



No 3D structure of a /x-conotoxin has been published. 
15 Hidaka et al . (HIDA90) have established the connectivity of 

the disulfides. The following diagram depicts geographutoxin 
I (also known as /x-conotoxin GIIIA) , whose sequence is SEQ ID 
NO: 33 . 



20 



25 



30 



35 



Rl 



\ 



D2 

\ /K16 P17 

C3 : :C15 \ 
| \ Q18 

| \ -R19 1 

C4 : :C20- \ 



/ 



T5 



P6 



\ 



/ 

P7 CIO : :C21 

| | | L A22 

I / I / 
K8-K9 Kll D12 



Q14 
I 

R13 



40 The connection from R19 to C20 could go over or under the 



I: 
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strand from Q14 to C15. One preferred form of variegation is 
to vary the residues in one loop. Because the longest loop 
contains only five amino acids, it is appropriate to also vary 
the residues connected to the cysteines that form the loop. 
5 For example, we might vary residues 5 through 9 plus 2, 11, 
19, and 22. Another useful variegation would be to vary 
residues 11-14 and 16-19, each through eight amino acids. 
Concerning \x conotoxins, see BECK8 9b, BECK8 9c, CRUZ 8 9 , and 
HIDA90 . 

10 The Q-conotoxins may be represented as follows (SEQ ID 

NO: 34 through 39) : 

1 2 3 1 1 2 ' 3 ■ 

C-(6 AAs) -C- (6 AAs) -C-C- (2-3 AAs) -C- (4-6 AAs) -C 

1 1 H I I 

15 1 1 1 | 

! I 

The King Kong peptide has the same disulfide arrangement as 
the Q-conotoxins but a different biological activity. 

2 0 Woodward et al . (WOOD90) report the sequences of three 

homologuous proteins from textile . Within the mature toxin 
domain, only the cysteines are conserved. The spacing of the 
cysteines is exactly conserved, but no other position has the 
same amino acid in all three sequences and only a few 
25 positions show even pair-wise matches. Thus we conclude that 
all positions (except the cysteines) may be substituted freely 
with a high probability that a stable disulfide structure will 
form. Concerning Q conotoxins, see HILL89 and SUNX87. 

Another mini -protein which may be used as a parental 

3 0 binding domain is the Cucurbit a maxima trypsin inhibitor I 

(CMTI-I) ; CMTI-III is also appropriate. They are members of 
the squash family of serine protease inhibitors, which also 
includes inhibitors from summer squash, zucchini, and 
cucumbers (WIEC85) . McWherter et al^ (MCWH89) describe 
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synthetic sequence- variants of the squash- seed protease 
inhibitors that have affinity for human leukocyte elastase and 
cathepsin G. Of course, any member of this family might be 
used . 

5 CMTI -I is one of the smallest proteins known, comprising 

only 29 amino acids held in a fixed comformation by three 
disulfide bonds. The structure has been studied by Bode and 
colleagues using both X- ray diffraction (BODE89) and NMR 
(HOLA89a,b) . CMTI-I is of ellipsoidal shape; it lacks helices 

10 or S-sheets, but consists of turns and connecting short 

polypeptide stretches. The disulfide pairing is Cys3-Cys2 0 / 
Cysl0-Cys22 and Cysl6-Cys2 8. In the CMTI-I : trypsin complex 
studied by Bode et al . , 13 of the 2 9 inhibitor residues are in 
direct contact with trypsin; most of them are in the primary 

15 binding segment Val2 (P4 ) -Glu9 (P4 ' ) which contains the 

reactive site bond Arg5 (PI) -Ile6 and is in a conformation 
observed also for other serine proteinase inhibitors. 

CMTI-I has a K ± for trypsin of ~1.5-10~ 12 M. McWherter et 
al . suggested substitution of "moderately bulky hydrophobic 

20 groups" at PI to confer HLE specificity. They found that a 

wider set of residues (VAL, ILE, LEU, ALA, PHE, MET, and GLY) 
gave detectable binding to HLE. For cathepsin G, they 
expected bulky (especially aromatic) side groups to be 
strongly preferred. They found that PHE, LEU, MET, and ALA 

25 were functional by their criteria; they did not test TRP, TYR, 
or HIS. (Note that ALA has the second smallest side group 
available . ) 

A preferred initial variegation strategy would be to vary 
some or all of the residues ARG l7 VAL 2 , PR0 4/ ARG 5/ ILE 6/ LEU 7 , 
30 MET 8/ GLU 9/ LYSn, HIS 25 , GLY 26 , TYR 27 , and GLY 29 . If the target 
were HNE , for example, one could synthesize DNA embodying the 
following possibilities: 



332 



Parental 



vg Allowed 
Codon amino acids 



#AA seqs/ 
#DNA seqs 



ARG1 
VAL2 
5 PR04 
ARG5 
ILE6 
LEU7 
TYR2 7 



VNT 
NWT 
VYT 
VNT 
NNK 
VWG 
NAS 



R, S , L, P, H, I,T,N / V,A,D / G 12/12 

V, I,L,F, Y,H,N,D 8/8 

P,L,T,I f A,V 6/6 

R,S,L,P,H,I,T,N,V,A,D # G 12/12 

all 20 20/31 

L/ Q/ M, K, V, E 6/6 

Y,H,Q,N,K,D,E 7/8 



10 (rfpffl^rTflWpfjrftR^^ 

This allows about 5.81-10 6 amino-acid sequences encoded by 
about 1.03 -10 7 DNA sequences. A library comprising 5.0-10 7 
independent transf ormants would give «99% of the possible 
sequences. Other variegation schemes could also be used. 

15 Other inhibitors of this family include: 

Trypsin inhibitor I from Citrullus vulgaris (0TLE87) , 
Trypsin inhibitor II from Bryonia dioica (OTLE87) , 
Trypsin inhibitor I from Cucurbita maxima (in OTLE87) , 
trypsin inhibitor III from Cucurbita maxima (in OTLE87) , 

20 trypsin inhibitor IV from Cucurbita maxima (in OTLE87) , 
trypsin inhibitor II from Cucurbita pepo (in OTLE87) , 
trypsin inhibitor III from Cucurbita pepo (in OTLE87) , 
trypsin inhibitor lib from Cucumis sativus (in OTLE87) , 
trypsin inhibitor IV from Cucumis sativus (in OTLE87) , 

2 5 trypsin inhibitor II from Ecballium elaterium (FAVE8 9) , and 
inhibitor CM-1 from Momordica repens (in OTLE87) . 



Another mini -protein that may be used as an initial 
potential binding domain is the heat -stable enterotoxins 
3 0 derived from some enterotoxogenic E^ coli , Citrobacter 

f reundii , and other bacteria (GUAR89) . These mini-proteins 
are known to be secreted from E_;_ coli and are extremely 
stable . Works related to synthesis , cloning, expression and 



333 

properties of these proteins include: BHAT86, SEKI85, SHIM87, 
TAKA8 5 , TAKE90 , THOM85a, b, YOSH85, DALL90 , DWAR89, GARI87, 
GUZM89 , GUZM90, HOUG84 # KUB089, KUPE90 , OKAM87, OKAM88, and 
OKAM90 . 

5 Another preferred IPBD is crambin or one of its 

homologues, the phoratoxins and ligatoxins (LEC087) . These 
proteins are secreted in plants. The 3D structure of crambin 
has been determined. NMR data on homologues indicate that the 
3D structure is conserved. Residues thought to be on the 
10 surface of crambin, phoratoxin, or ligatoxin are preferred 
residues to vary. 

EXAMPLE XV 

A MINI-PROTEIN HAVING A CROSS-LINK CONSISTING OF CU(II), ONE 
15 CYSTEINE, TWO HISTIDINES, AND ONE METHIONINE . 

Sequences such as 
HIS-ASN-GLY-MET-Xaa-Xaa-Xaa-Xaa-Xaa-Xaa-HIS-ASN-GLY-CYS (SEQ 
ID NO: 40) and 

CYS-ASN-GLY-MET-Xaa-Xaa-Xaa-Xaa-Xaa-Xaa-HIS -ASN-GLY-HIS (SEQ 
20 ID NO:41) are likely to combine with Cu(II) to form structures 
as shown in the diagram: 



25 



Xaa7- 

/ 

Xaa6 



-Xaa8 
\ 

Xaa9 



Xaa7- 

/ 

Xaa6 



-Xaa8 
\ 

Xaa9 



30 



35 



Xaa5 
\ 

MET4 



XaalO 
/ 

HIS11 



/ \ / \ 
/ \ / \ 

GLY3 Cu ASN12 

I / \ I 

ASN2-HIS1 CYS 1 4— GLY1 3 

I I 
NH 2 COO 

SEQ ID NO: 40 



Xaa5 XaalO 

\ / 
MET4 HIS11 

/ \ / \ 
/ \ / \ 

GLY3 Cu ASN12 

I / \ I 

ASN2-CYS1 HIS14-GLY13 

I I 

NH 2 COO 

SEQ ID NO : 4 1_ 
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Other arrangements of HIS, MET, HIS, and CYS along the chain 
are also likely to form similar structures. The amino acids 
ASN-GLY at positions 2 and 3 and at positions 12 and 13 give 
the amino acids that carry the metal -binding ligands enough 
5 flexibility for them to come together and bind the metal. 
Other connecting sequences may be used, e.g. GLY-ASN, SER- 
GLY, GLY-PRO, GLY- PRO-GLY , or PRO - GLY - ASN could be used. It 
is also possible to vary one or more residues in the loops 
that join the first and second or the third and fourth metal - 
10 binding residues. For example (SEQ ID NO:42), 



Xaa8 Xaa9 

/ \ 



15 



Xaa7 XaalO 

I I 

Xaa6 Xaall 

\ / 
MET5 HIS12 



Xaa4 \ / \ 

20 | \ / \ 

PR03 Cu ASN13 

\ / \ I 

GLY2-HIS1 CYS15— GLY14 

I I 

25 NH 2 COO 

is likely to form the diagrammed structure for a wide variety 
of amino acids at Xaa4 . It is expected that the side groups 
of Xaa4 and Xaa6 will be close together and on the surface of 

30 the mini -protein . 

The variable amino acids are held so that they have 
limited flexibility. This cross-linkage has some differences 
from the disulfide linkage. The separation between C a4 and C a n 
is greater than the separation of the C a s of a cystine. In 

35 addition, the interaction of residues 1 through 4 and 11 

through 14 with the metal ion are expected to limit the motion 
of residues 5 through 10 more than a disulfide between rsidues 
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4 and 11. A single disulfide bond exerts strong distance 
constrains on the a carbons of the joined residues, but very- 
little directional constraint on, for example, the vector from 
N to C in the main-chain. 
5 For the desired sequence, the side groups of residues 5 

through 10 can form specific interactions with the target. 
Other numbers of variable amino acids, for example, 4, 5, 7, 
or 3, are appropriate. Larger spans may be used when the 
enclosed sequence contains segments having a high potential to 

10 form of helices or other secondary structure that limits the 

conformational freedom of the polypeptide main chain. Whereas 
a mini -protein having four CYSs could form three distinct 
pairings, a mini -protein having two HISs, one MET, and one CYS 
can form only two distinct complexes with Cu. These two 

15 structures are related by mirror symmetry through the Cu . 

Because the two HISs are distinguishable, the structures are 
different . 

When such metal -containing mini -proteins are dis played 
on filamentous phage, the cells that produce the phage can be 
20 grown in the presence of the appropriate metal ion, or the 
phage can be exposed to the metal only after they are 
separated from the cells. 
EXAMPLE XVI 

A MINI - PROTEIN HAVING A CROSS-LINK CONSISTING OF ZN(II) AND 

2 5 FOUR CYSTEINES 

A cross link similar to the one shown in Example XV is 
exemplified by the Zinc-finger proteins (GIBS88, GAUS87, 
PARR88, FRAN87, CHOW87, HARD90) . One family of Zinc-fingers 
has two CYS and two HIS residues in conserved positions that 

3 0 bind Zn ++ (PARR8 8, FRAN8 7, CHOW8 7, EVAN88, BERG8 8, CHAV88) . 

Gibson et^ al . (GIBS88) review a number of sequences thought to 
form zinc -fingers and propose a three-dimensional model for 
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these compounds. Most of these sequences have two CYS and two 
HIS residues in conserved positions, but some have three CYS 
and one HIS residue. Gauss et al . (GAUS87) also report a 
zinc-finger protein having three CYS and one HIS residues that 
5 bind zinc. Hard et al . (HARD90) report the 3D structure of a 
protein that comprises two zinc-fingers, each of which has 
four CYS residues. All of these zinc-binding proteins are 
stable in the reducing intracellular environment . 

One preferred example of a CYS:: zinc cross linked mini- 
10 protein comprises residues 440 to 461 of the sequence shown in 
Figure 1 of HARD90 . The resiudes 444 through 456 (SEQ ID 
NO: 43) may be variegated. One such variegation is as follows: 



Parental 


Allowed 








#AA 


/ 


#DNA 


SER444 


SER, 


ALA 








2 


/ 


2 


ASP445 


ASP, 


ASN, 


GLU, 


LYS 




4 


/ 


4 


GLU44 6 


GLU, 


LYS, 


GLN 






3 


/ 


3 


ALA44 7 


ALA, 


THR, 


GLY, 


SER 




4 


/ 


4 


SER448 


SER, 


ALA 








2 


/ 


2 


GLY44 9 


GLY, 


SER, 


ASN, 


ASP 




4 


/ 


4 


CYS450 


CYS, 


PHE, 


ARG, 


LEU 




4 


/ 


4 


HIS451 


HIS, 


GLN, 


ASN, 


LYS, 


ASP, 


GLU 6 


/ 


6 


TYR4 52 


TYR, 


PHE, 


HIS, 


LEU 




4 


/ 


4 


GLY4 53 


GLY, 


SER, 


ASN, 


ASP 




4 


/ 


4 


VAL4 54 


VAL, 


ALA, 


ASP, 


GLY, 


SER, 


ASN, THR, ILE 
8 


/ 


8 


LEU455 


LEU, 


HIS, 


ASP, 


VAL 




4 


/ 


4 


THR4 56 


THR, 


ILE, 


ASN, 


SER 




4 


/ 


4 



This leads to 3.77-10 7 DNA sequences that encode the same 
30 number of amino-acid sequences. A library having 1.0-10 8 
indepentent transf ormants will display 93% of the allowed 
sequences; 2.0-10 8 independent transf ormants will display 99.5% 
of allowed sequences. 



Table 1: Single-letter codes. 
Single-letter code is used for proteins : 



a 




ALA c 




CYS d 


= ASP e = 


GLU f = 


PHE 


g 




GLY h 




HIS i 


= ILE k = 


LYS 1 = 


LEU 


m 




MET n 




ASN p 


= PRO q = 


GLN r = 


ARG 


s 




SER t 




THR v 


= VAL w = 


TRP y = 


TYR 






STOP 




* 


= any amino acid 




b 




n or 


d 










z 




e or 


q 











x = any amino acid 



15 

Single-letter IUB codes for DNA : 





T, 


c, 


A, G stand 


for themselves 


20 


M 


for 


A or C 








R 


for 


puRines A 


or G 






W 


for 


A or T 








S 


for 


C or G 








Y 


for 


pYrimidines 


T or 


C 


25 


K 


for 


G or T 








V 


for 


A # C, or G 


(not 


T) 




H 


for 


A, C, or T 


(not 


G) 




D 


for 


A, G, or T 


(not 


C) 


30 


B 


for 


C, G 7 or T 


(not 


A) 




N 


for 


any base . 







If 

I 
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Table 2: Preferred Outer-Surface Proteins 
Preferred 





Genetic 


Oi i •f - >" — Cii>-"Fj3/~'f=> 

UULCI OUi. ICIUC 










pT*nh pin 


Reason for preference 




M13 


coat protein 


cl ) 


exDosed amino terminus 










fcmVTTT^ 
vyp viii/ 








b) 


nredictable nnst- 










trans la tional 










processiny , 


1 0 






c) 


numerous copies in 










virion . 








a. ) 


EUSlOn Qdta aVaildDlc 






gp III 


a) 


fusion data available. 








b) 


anri t~~ ^ t*tti i nnc ^ vn o c ^ <H 


15 






c) 


working example 










available . 




PhiX174 


G protein 


a / 


TvllLJ Wll L.\J Ut: V-Jll V JLi J.UI1 










exterior, 








b) 


small enough that 










t - V» f=» d— "i TiViH crene fan 










replace H gene. 




E. coli 


LamB 


a ) 


1 U.O XVJ11 U.C1 UCl u V CI JL XQJJXC f 


25 








±lv_Jil CbbCllLlCtl . 






OmpC 


a ) 










b) 


non-essential * abundant 






OmpA 


a ) 


t~ nnnl on n 1 moH f=» 1 

L.^JUV./± .3 1 1 IV—/ Vwl _1_ 








xJ / 


non -pccpttI" i 3 "1 • aViimHaTif" 
Ilv_Jll CbbCHI — LctX f ctU U.lll_lctll L. 








c ) 


hnmnl nmiPQ "in oMipt rrpnpra 

ll(JIII^/J.UMUCO J. 11 t_ 1 J. V3. J- U CL 1. CI- C*. 






OmpF 


a) 


topological model 








b) 


non-essential; abundant 






PhoE 


a) 


topological model 








b) 


non - e s s ent i a 1 ; abundant 








c) 


inducible 


40 












B. subtilis 


CotC 


a) 


no post-translational 




spores 






processing, 








b) 


distinctive sdequence 










that causes protein to 


45 








localize in spore coat, 








c) 


non-essential . 






CotD 


Same as for CotC . 
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Table 3 : Ambiguous DNA for AA_seq2 



10 



m 
1 

A.T.G 



a 
9 

G.C.n 



v 
17 
G.T.n 



P 
25 
C.C.n 



y 

33 
T. A.y 



i 

41 
A.T.h 

k 
49 
A. A. r 



v 
57 
G.T.n 



k 
2 

A. A. r 



s 

10 
T.C.n 
A.G.y 

P 
18 
C.C.n 



d 
26 
G.A.y 



t 

34 
A.C.n 



l 

42 
A.T.h 

a 
50 
G.C.n 



Y 
58 
T. A.y 



k 
3 

A.A.r 



v 
11 
G.T.n 



m 
19 
A.T.G 



f 

27 
T.T.y 



9 
35 
G.G. n 



r 
43 
C.G.n 

g 

51 
G.G.n 



g 

59 
G.G.n 



s 
4 

T.C.n 
A.G.y 

a 
12 
G.C.n 



1 

20 
T. T . r 
C . T . n 

c 

28 
T.G.y 



P 
36 
C.C.n 



y 

44 
T. A.y 

1 

52 
T.T. r 
C.T.n 

g 

60 
G.G.n 



1 

5 

T.T. 
C.T. 



r 
n 



v 
13 
G.T.n 



s 

21 
T.C.n 
A.G.y 

1 

29 
T.T.r 
C.T.n 

c 

37 
T.G.y 



f 

45 
T.T.y 

c 

53 
T.G.y 



c 

61 
T.G.y 



v 
6 

G.T.n 



a 
14 

. C . n 



f 

22 
T.T.y 



e 

30 
G.A. r 



k 
38 
A.A.r 



y 

46 
T. A.y 

q 

54 
C.A.r 



r 
62 
C.G.n 
A.G. r 



1 
7 

T.T.r 
C.T.n 

t 

15 
A.C.n 



a 
23 
G.C.n 



P 
31 
C.C.n 



a 

39 
G.C.n 



n 
47 
A. A.y 

t 

55 
A.C.n 



a 
63 
G.C.n 



k 
8 

A. A. 



1 

16 
T.T.r 
C.T.n 

r 
24 
C.G.n 
A. G. r 

P 
32 
C.C.n 



r 

40 
C.G.n 
A.G. r 

a 

48 
G.C.n 

f 

56 
T.T.y 



k 
64 
A.A.r 
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Table 3, continued, 



10 



r 
65 
C . G. n 
A.G. r 

d 
73 
G. A.y 

a 

81 
G.C.n 

k 
89 
A. A. r 



a 
97 
G.C.n 



y 

105 
T. A.y 

i 
113 
A.T.h 

k 
121 
A. A. r 



k 

129 
A. A. r 



n 
66 
A. A.y 



c 

74 
T. G.y 

a 
82 
G.C.n 

a 

90 
G.C.n 



s 

98 
T.C.n 
A. G.y 

a 
106 
G.C.n 

v 
114 
G.T.n 

1 
122 
T.T.r 
C.T.n 

a 
130 
G.C.n 



n 
67 
A. A.y 



m 
75 
A.T.G 

e 

83 
G.A.r 

a 
91 
G.C.n 



a 
99 
G.C.n 



w 
107 
T.G.G 

g 

115 
G.G.n 

f 
123 
T.T.y 



s 
131 
T.C.n 
A. G.y 



f 

68 
T.T.y 



r 
76 
C . G. n 

9 
84 

G.G.n 
f 

92 
T.T.y 



t 
100 
A.C.n 



a 
108 
G.C.n 

a 
116 
G.C.n 

k 
124 
A.A.r 



132 
T . A. r 
T.G. A 



k 
69 
A.A.r 



t 

77 
A.C.n 

d 
85 
G. A.y 

N 
93 
A. A.y 



e 
101 
G.A.r 



m 
109 
A.T.G 

t 
117 
A.C.n 

k 
125 
A.A.r 



133 
T. A. r 
T.G. A 



s 

70 
T.C.n 
A. G.y 

c 

78 
T.G.y 

d 
86 
G. A.y 

s 

94 
T.C.n 
A. G.y 

y 

102 
T.A.y 



v 
110 
G.T.n 

i 
118 
A.T.h 

f 
126 
T.T.y 



134 
T. A. r 
T.G. A 



a 
71 
G.C.n 



g 

79 
G.G.n 

P 
87 
C.C.n 

1 

95 
T.T.r 
C.T.n 

i 
103 
A.T.h 



v 
111 
G.T.n 

g 

119 
G.G.n 

t 
127 
A.C.n 



e 
72 
G.A. r 



g 

80 
G.G.n 

a 
88 
G.C.n 

q 

96 
C.A.r 



g 

104 
G.G.n 



v 
112 
G.T.n 

i 
120 
A.T.h 

s 
128 
T.C.n 
A. G.y 
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Table 4: Table of Restriction Enzyme Suppliers 



Suppliers : 
5 Sigma Chemical Co. 

P.O.Box 14508 

St. Louis, Mo. 63178 

Bethesda Research Laboratories 
10 P.O.Box 6009 

Gaithersburg, Maryland, 2 0877 

Boehringer Mannheim Biochemicals 
7941 Castleway Drive 
15 Indianapolis, Indiana, 46250 

International Biochemicals, Inc. 
P.O.Box 9558 

New Haven, Connecticutt , 0653 5 

20 

New England BioLabs 
32 Tozer Road 

Beverly, Massachusetts, 01915 

2 5 Promega 

2800 S. Fish Hatchery Road 
Madison, Wisconsin, 53711 



30 



Stratagene Cloning Systems 
110 99 North Torrey Pines Road 
La Jolla, California, 92037 
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Table 5: Potential sites in ipbd gene. 



Summary 



of 



cuts 



10 



15 



20 



25 



30 



Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 



- %Acc I 
= Afl II 
= Apa I 
= Asu II 
= Ava II 
= BspM I 
= BssH I 
= %BstX 
= +Dra I 
= + EcoN 
= + Esp I 
= Hind I 
I 



= Kpn 
= Mlu 
= Nar 
= Nco 
= Nhe 
= Nru 
= + Pf 1M 
= PmaC I 
= +PpuM 
= +Rsr I 
= + Sf i I 
= Spe I 
= Sph I 
= Stu I 
= % Sty I 
= Xba I 
= Xho I 
= Xma II 



has 3 elective sites 
has 1 elective sites 
has 2 elective sites : 

has 1 elective sites 
I has 1 elective sites 
I has 1 elective sites 
I has 2 elective sites 
I has 1 elective sites 
I has 3 elective sites 

I has 2 elective sites 
has 2 elective sites 

II has 6 elective sites 
has 1 elective sites 
has 1 elective sites 
has 2 elective sites 
has 1 elective sites 
has 3 elective sites 
has 2 elective sites 

I has 1 elective sites 

has 1 elective sites 
I has 2 elective sites 
I has 1 elective sites 

has 2 elective sites 
has 3 elective sites 
has 1 elective sites 
has 5 elective sites 

has 6 elective sites 
has 1 elective sites 
has 1 elective sites 
I has 3 elective sites 



96 169 281 
19 
102 103 
381 

314 

72 

67 115 
323 

102 103 226 

62 94 
57 187 
: 9 23 60 287 361 386 
48 
314 

238 343 
323 

25 289 388 
38 65 
94 
228 
102 226 
102 
24 261 
12 45 379 
221 

23 70 150 287 386 

11 44 143 263 323 383 
84 
85 

70 209 242 



Enzymes not cutting ipbd 



Avr II 
EcoR I 
Sac I 
Xma I 



BamH I 
EcoR V 
Sal I 



Bel I 
H£a I 
Sau I 



BstE II 
Not I 
Sma I 
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Table 6: Exposure of amino acid types in T4 lzm & HEWL. 

HEADER HYDROLASE (O-GLYCOSYL) 18-AUG-86 2LZM 

COMPND LYSOZYME (E . C . 3 . 2 . 1 . 17 ) 

5 AUTHOR L . H . WEAVER , B . W . MATTHEWS 

Coordinates from Brookhaven Protein Data Bank: 1LYM. 

Only Molecule A was considered. 

10 

HEADER HYDROLASE (O-GLYCOSYL) 29-JUL-82 1LYM 

COMPND LYSOZYME (E . C . 3 . 2 . 1 . 17 ) 

AUTHOR J . HOGLE , S . T . RAO , M . SUNDARAL I NGAM 

15 Solvent radius = 1.4 0 Atomic radii in Table 7. 

Surface area measured in A 2 . 

Type Max 
2 0 N <area> sigma max min 

exposed (fraction) 



ALA 


27 


211 


. 0 


1 


.47 


214 


.3 


207 


. 1 


85 


. 1 ( 


0 


.40) 


CYS 


10 


239 


. 8 


3 


. 56 


245 


. 5 


234 


.4 


38 


.3 ( 


0 


.16) 


ASP 


17 


271 


. 1 


5 


.36 


281 


.4 


262 


. 5 


127 


.1 ( 


0 


.47) 


GLU 


10 


297 


.2 


5 


.78 


304 


.9 


285 


.4 


100 


- 7 ( 


0 


.34) 


PHE 


8 


316 


.6 


5 


. 92 


325 


.4 


307 


. 5 


99 


- 8 ( 


0 


. 32) 


GLY 


23 


185 


.5 


1 


.31 


188 


.3 


183 


.3 


91 


.9 ( 


0 


. 50) 


HIS 


2 


297 


. 7 


3 


.23 


301 


. 0 


294 


. 5 


32 


- 9 ( 


0 


.11) 


ILE 


16 


278 


. 1 


3 


. 61 


285 


.6 


269 


. 6 


57 


.5 ( 


0 


.21) 


LYS 


19 


309 


.2 


5 


.38 


321 


. 9 


300 


. 1 


147 


. 1 ( 


0 


.48) 


LEU 


24 


282 


. 6 


6 


. 75 


304 


. 0 


269 


. 8 


109 


- 9 ( 


0 


.39) 


MET 


7 


293 


. 0 


5 


. 70 


299 


. 5 


283 


. 1 


88 


.2 ( 


0 


.30) 


ASN 


26 


273 


. 0 


5 


. 75 


285 


. 1 


262 


. 6 


143 


.4 ( 


0 


. 53) 


PRO 


5 


239 


. 9 


2 


. 75 


242 


. 1 


234 


. 6 


128 


.7 ( 


0 


. 54) 


GLN 


8 


299 


. 5 


4 


. 75 


305 


. 8 


291 


. 5 


145 


- 9 ( 


0 


.49) 


ARG 


24 


344 


. 7 


8 


. 66 


355 


. 8 


326 


. 7 


240 


.7 ( 


0 


. 70) 


SER 


16 


228 


. 6 


3 


. 59 


236 


. 6 


223 


. 3 


98 


.2 ( 


0 


.43) 


THR 


18 


250 


.3 


3 


. 89 


257 


.2 


244 


. 2 


139 


. 9 ( 


0 


. 56) 


VAL 


15 


254 


.3 


4 


. 05 


261 


. 8 


245 


. 7 


111 


. 1 ( 


0 


.44) 


TRP 


9 


359 


.4 


3 


.38 


366 


.4 


355 


. 1 


102 


.0 ( 


0 


.28) 


TYR 


9 


335 


. 8 


4 


. 97 


342 


. 0 


325 


. 0 


72 


. 6 ( 


0 


. 22) 
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Table 7 : Atomic radii 



C a 1-70 

Ocarbonyl 1-52 

Namide 1.55 

Other atoms 1.80 



10 



Table 8 

15 Fraction of DNA molecules having 
n non-parental bases when 
reagents that have fraction 
M of parental nucleotode. 

20 



M 


. 9965 


. 97716 


. 92612 


. 8577 


.79433 


.63096 


f 0 


. 9000 


. 5000 


. 1000 


. 0100 


. 0010 


. 000001 


f 1 


. 09499 


. 35061 


.2393 


.04977 


.00777 


. 0000175 


f2 


. 00485 


. 1188 


.2768 


. 1197 


.0292 


. 000149 


f 3 


. 00016 


. 0259 


.2061 


. 1854 


. 0705 


. 000812 


f4 . 


000004 


. 00409 


. 1110 


.2077 


.1232 


. 003207 


f 8 


0. 


2 • 10" 7 


. 00096 


. 0336 


. 1182 


. 080165 


f 16 


0 . 


0 . 


0 . 


5 • 10" 7 


. 00006 


. 027281 


f 23 


0 . 


0 . 


0 . 


0 . 


0 . 


. 0000089 


most 


0 


0 


2 


5 


7 


12 



35 

"most" is the value of n having the highest probability. 
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Table 9 : best vgCodon 



5 Program "Find Optimum vgCodon. " 

INITIALIZE-MEMORY-OF- ABUNDANCES 
DO ( tl = 0.21 to 0.31 in steps of 0.01 ) 
.DO ( cl = 0.13 to 0.23 in steps of 0.01 ) 
. . DO ( al = 0.23 to 0.33 in steps of 0.01 ) 
10 Comment calculate gl from other concentrations 

. . . gl = 1.0 - tl - cl - al 
. . . IF( gl .ge. 0.15 ) 

. . . .DO ( a2 = 0.37 to 0.50 in steps of 0.01 ) 

DO ( c2 = 0.12 to 0.20 in steps of 0.01 ) 

15 Comment Force D+E = R + K 

g2 = (gl*a2 - . 5*al*a2 ) / (cl+0 . 5*al) 

Comment Calc t2 from other concentrations. 

t2 = 1. - a2 - c2 - g2 

. . .... IF(g2.gt. 0.1. and. t2.gt.0.1) 

2 0 CALCULATE - ABUNDANCES 

COMPARE -ABUNDANCES - TO - PREVIOUS - ONES 

end_IF_block 

end_DO_loop ! c2 

end_DO_loop ! a2 

2 5 end_IF_block 1 if gl big enough 

. . . . end_DO_loop ! al 

. . . end_DO_loop I cl 

. . end_DO_loop ! tl 

WRITE the best distribution and the abundances. 
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Table 10: Abundances obtained 
from various vgCodons ' 

A. Optimized qfk Codon, Restrained by [D] + [E] = [K] + [R] 

5 



20 







T 


C 


A 


G 


1 




.26 


. 18 


.26 


.30 q 


2 




.22 


. 16 


.40 


. 22 f 


3 




.5 


. 0 


. 0 


. 5 k 


Amino 








Amino 




acid 


Abundance 




acid 


Abundance 


A 




4 .80% 




c 


2 . 86% 


D 




6 . 00% 




E 


6 . 00% 


F 




2 . 86% 




G 


6 . 60% 


H 




3 . 60% 




I 


2 . 86% 


K 




5.20% 




L 


6 .82% 


M 




2 . 86% 




N 


5.20% 


P 




2 . 88% 




Q 


3 .60% 


R 




6.82% 




S 


7.02% mfaa 


T 




4 . 16% 




V 


6 .60% 


W 




2 .86% 


lfaa 


Y 


5.20% 


stop 


5 . 


20% 








[D] + 


[E] 


^ [K] 


+ [R] = 


. 12 





ratio = Abun(W) /Abun(S) = 0.4074 

25 



1 


(1/ratio) j 


(ratio) j 


stop- free 


1 


2 .454 


.4074 


. 9480 


2 


6 . 025 


. 1660 


. 8987 


3 


14 . 788 


. 0676 


. 8520 


4 


36 . 298 


. 0275 


. 8077 


5 


89 . 095 


. 0112 


. 7657 


6 


218 . 7 


4 . 57 • 10-3 


. 7258 


7 


536 . 8 


1.86-10-3 


. 6881 
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3 



Table 10: Abundances obtained 
from various vgCodon (continued) 



Unrestrained, optimized 

T C A G 



.27 
.21 

5 



. 19 
. 15 

0 



.27 
.43 
0 



.27 
.21 

5 



10 



15 



20 



Amino 
acid 



A 
D 
F 
H 
K 
M 
P 
R 
T 
W 



stop 



Abundance 



4 , 

5 . 
2 
4 
5 , 
2 
2 
6 
4 



05% 
81% 
84% 
08% 
81% 
84% 
85% 
83% 
05% 



2.84% lfaa 



5.81% 



Amino 
acid 



C 
E 
G 
I 
L 
N 

Q 

s 



V 
Y 



Abundance 



2 , 
5 , 

5 , 
2 

6 , 

5 , 
4 . 

6 , 



84% 
81% 
67% 
84% 
83% 
81% 
08% 

89% mfaa 



5 . 67% 
5 . 81% 



[D] + [E] = 0.1162 [K] + [R] = 0.1264 
ratio = Abun(W) /Abun(S) = 0.41176 



25 



2 (1/ratio) j 
1 2.4286 

30 2 5.8981 

3 14.3241 

4 34.7875 

5 84.4849 

6 205.180 
35 7 498.3 



(ratio) 3 
.41176 
. 16955 
. 06981 
. 02875 
. 011836 
. 004874 
2 . 007 • 10* 



stop-free 
. 9419 
. 8872 
. 8356 
.7871 
. 74135 
.69828 
. 6577 



Table 10: Abundances obtained 
from various vgCodon (continued) 

C. Optimized NNT 

5 







T C A 


G 




1 




.2071 .2929 .2071 




2929 


2 




.2929 .2071 .2929 




2071 


3 




1 . .0 .0 .0 






Amino 




Amino 






acid 




Abundance acid 




Abundance 


A 




6 . 06% 


C 


4.29% lfaa 


D 




8 . 58% 


E 


none 


F 




6 . 06% 


G 


6. 06% 


H 




8 . 58% 


I 


6 . 06% 


K 




none 


L 


8 . 58% 


M 




none 


N 


6 . 06% 


P 




6 . 06% 


Q 


none 


R 




6 . 06% 


s 


8 . 58% 


T 




4.29% lfaa 


V 


8 . 58% 


W 




none 


Y 


6 . 06% 


stop 


none 







i 


(1/ratio) j 


(ratio) j 


stop-free 


i 


2 . 0 


. 5 


1 . , 


2 


4 . 0 


.25 


1 . 


3 


8 . 0 


. 125 


1. 


4 


16 . 0 


. 0625 


1 . 


5 


32 . 0 


. 03125 


1. 


6 


64 . 0 


. 015625 


1 . 


7 


128 . 0 


. 0078125 


1 . 
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Table 10: Abundances obtained 
from various vgCodon (continued) 



5 D. Optimized NNG 



1 
2 
3 



.23 
.215 
. 0 



.21 
.285 
. 0 



.23 
.285 
. 0 



.33 
.215 
1 . 0 



Amino 
acid 



A 
D 
F 
H 
K 
M 
P 
R 
T 
W 



Abundance 

9.40% 

none 

none 

none 

6.60% 

4 . 90% 

6 . 00% 

9.50% 

6.6 % 

4.90% lfaa 



Amino 
acid 



C 
E 
G 
I 
L 



N 
Q 
S 
V 
Y 



Abundance 
none 
9.40% 
7 . 10% 
none 

9.50% mfaa 

none 

6 . 00% 

6 . 60% 

7 . 10% 
none 



stop 6.60% 



10 



1 
1 

2 

3 

4 

5 

6 

7 



(1/ratio) j 
1 . 9388 
3 . 7588 
7 .2876 
14 . 1289 
27 .3929 
53 . 109 
102 . 96 



(ratio) j 
. 51579 
.26604 
. 13722 
. 07078 
3.65-10* : 
1 .88 - 10" : 
9 . 72 • 10 



-3 



stop-free 
0 . 934 
0 .8723 
0 .8148 
0.7610 
0.7108 
0 . 6639 
0.6200 
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Table 10: Abundances obtained 
from optimum vgCodon (continued) 



E. Unoptimized NNS (NNK gives identical distribution) 





T 


C 


A 


G 


1 


.25 


.25 


.25 


.25 


2 


.25 


.25 


.25 


.25 


3 


.0 


. 0 


. 0 


0 . 5 



Amino 
acid 

A 
D 
F 
H 
K 
M 
P 
R 
T 
W 

stop 



Abundance 

6 .25% 

3 . 125 

3 . 125 

3 . 125 

3 . 125 

3 . 125 

6 .25% 

9 . 375 

6 . 25% 

3 . 125% 

3 . 125% 



Amino 
acid 



C 
E 
G 
I 
L 
N 

Q 
S 
V 
Y 



Abundance 
3 . 125% 
3 . 125% 
6 .25% 
3 . 125% 
9 .375% 
3 . 125% 
3 . 125% 
9 .375% 
6.25% 
3 . 125% 



1 
1 

2 

3 

4 

5 

6 

7 



(1/ratio) j 
3 . 0 
9. 0 
27 . 0 
81 . 0 
243 . 0 
729 . 0 
2187. 0 



(ratio) 3 
.33333 
. 11111 
. 03704 
. 01234567 
. 0041152 
1.37. 10" 3 
4 . 57 • 10" 4 



stop- free 
96875 
93853 
90915 

8807 

8532 
82655 

8007 
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Table 11: Calculate worst codon. 

Program "Find worst vgCodon within Serr of given 
distribution. " 

INITIALIZE -MEMORY- OF -ABUNDANCES 
Comment Serr is % error level. 

READ Serr 

Comment Tli , Cli , Ali , Gli , T2i , C2i , A2i , G2i , T3i,G3i 
Comment are the intended nt-distribution . 

READ Tli, Cli, Ali, Gli 

READ T2i, C2i, A2i, G2i 

READ T3i, G3i 

Fdwn = 1 . -Serr 

Fup = 1 . +Serr 

DO ( tl = Tli*Fdwn to Tli*Fup in 7 steps) 
. DO ( cl = Cli*Fdwn to Cli*Fup in 7 steps) 
. . DO ( al = Ali*Fdwn to Ali*Fup in 7 steps) 
. . . gl = 1. - tl - cl - al 
. . . IF( (gl-Gli)/Gli .It. -Serr) 
Comment gl too far below Gli, push it back 
. . . . gl = Gli*Fdwn 

.... factor = (l.-gl)/(tl + cl + al) 
. . . . tl = tl*factor 
. . . . cl = cl*factor 
. . . . al = al*factor 

end_IF_block 

. . . IF( (gl-Gli)/Gli .gt. Serr) 
Comment gl too far above Gli, push it back 
. . . . gl = Gli*Fup 

.... factor = (l.-gl)/(tl + cl + al) 
. . . . tl = tl*factor 
. . . . cl = cl*factor 
. . . . al = al*factor 
end__IF_block 

. . . DO ( a2 = A2i*Fdwn to A2i*Fup in 7 steps) 
. . . . DO ( c2 = C2i*Fdwn to C2i*Fup in 7 steps) 
DO (g2=G2i*Fdwn to G2i*Fup in 7 steps) 

Comment Calc t2 from other concentrations. 

t2 = 1 . - a2 - c2 - g2 

IF( (t2-T2i)/T2i .It. -Serr) 

Comment t2 too far below T2i, push it back 
t2 = T2i*Fdwn 

factor = (l.-t2)/(a2 + c2 + g2) 
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Table 11, continued 

a2 = a2* factor 

. . . . . . . c2 = c2*factor 

g2 = g2* factor 

end_IF_Jblock 

IF( (t2-T2i)/T2i .gt. Serr) 

Comment t2 too far above T2i, push it back 
t2 = T2i*Fup 

factor = (l.-t2)/(a2 + c2 + g2) 

a2 = a2*factor 

c2 = c2*factor 

g2 = g2*factor 

end_IF_block 

IF(g2.gt. 0.0 .and. t2.gt.0.0) 

t3 = 0 . 5* (1 . -Serr) 

g3 = 1. - t3 

CALCULATE -ABUNDANCES 

COMPARE - ABUNDANCES - TO - PREVIOUS - ONES 

t3 = 0.5 

g3 = 1. - t3 

CALCULATE -ABUNDANCES 

COM PARE - ABUNDANCE S - TO - PRE VI OUS - ONE S 

t3 = 0.5* (l.+Serr) 

g3 = 1. - t3 

CALCULATE -ABUNDANCES 

COMPARE-ABUNDANCES-TO- PREVIOUS -ONES 

end_IF_block 

end_DO_loop ! g2 

end_DO_loop ! c2 

end_DO_loop 1 a2 

. . . .end_DO_loop I al 
. . .end_DO_loop ! cl 
. . end_D0_loop ! tl 

WRITE the WORST distribution and the abundances. 
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Table 12 : Abundances obtained 
using optimum vgCodon assuming 
5% errors 



Amino Amino 

acid Abundance acid Abundance 

A 4.59% C 2.76% 

D 5.45% E 6.02% 

F 2.49% lfaa G 6.63% 

H 3.59% I 2.71% 

K 5.73% L 6.71% 

M 3.00% N 5.19% 

P 3.02% Q 3.97% 

R 7.68% mfaa S 7.01% 

T 4.37% V 6.00% 

W 3.05% Y 4.77% 

stop 5.27% 



ratio = Abun(F) Abun(R) = 0.3248 

j_ (l/ratio) j (ratio) j stop-free 

1 3.079 .3248 .9473 

2 9.481 .1055 .8973 

3 29.193 .03425 .8500 

4 89.888 .01112 .8052 

5 276.78 .3.61.10" 3 .7627 

6 852.22 1.17.10" 3 .7225 

7 2624.1 3.81-10" 4 .6844 
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Table 13 : BPTI Homologues 





1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 


14 


15 


16 


17 


18 


19 


-3 








































-2 


- 


- 


- 


Q 


T 


- 


- 


- 


- 


- 


- 


Q 


- 


- 


- 


H 


G 


Z 


- 


-1 


- 


_ 


- 


T 


E 


- 


- 


- 


- 


- 


- 


P 


- 


- 


- 


D 


D 


G 


- 


1 


R 


R 


R 


P 


R 


R 


R 


R 


R 


R 


R 


L 


A 


R 


R 


R 


K 


R 


A 


2 


P 


P 


P 


P 


P 


P 


P 


P 


P 


P 


P 


R 


A 


P 


P 


P 


R 


P 


A 


3 


D 


D 


D 


D 


D 


D 


D 


D 


D 


D 


D 


K 


K 


D 


R 


T 


D 


S 


K 


4 


F 


F 


F 


L 


F 


F 


F 


F 


F 


F 


F 


L 


Y 


F 


F 


F 


I 


F 


Y 


5 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


6 


L 


L 


L 


Q 


L 


L 


L 


L 


L 


L 


L 


I 


K 


E 


E 


N 


R 


N 


K 


7 


E 


E 


E 


L 


E 


E 


E 


E 


E 


E 


E 


L 


L 


L 


L 


L 


L 




L 


8 


P 


P 


P 


P 


P 


P 


P 


P 


P 


P 


P 


H 


P 


P 


P 


P 


P 


P 


P 


9 


P 


P 


P 


Q 


P 


P 


P 


P 


P 


P 


P 


R 


L 


A 


A 


P 


P 


A 


V 


10 


Y 


Y 


Y 


A 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


N 


R 


E 


E 


E 


E 


E 


R 


11 


T 


T 


T 


R 


T 


T 


T 


T 


T 


T 


T 


P 


I 


T 


T 


S 


Q 


T 


Y 


12 


6 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


13 


P 


P 


P 


P 


P 


P 


P 


P 


P 


P 


P 


R 


P 


L 


L 


R 


P 


P 


P 


14 


C 


T 


A 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


15 


K 


K 


K 


K 


K 


V 


G 


A 


L 


I 


K 


Y 


K 


K 


K 


R 


K 


K 


K 


16 


A 


A 


A 


A 


A 


A 


A 


A 


A 


A 


A 


Q 


R 


A 


A 


G 


G 


A 


K 


17 


R 


R 


R 


A 


A 


R 


R 


R 


R 


R 


R 


K 


K 


Y 


R 


H 


R 


S 


K 


18 


I 


I 


I 


L 


M 


I 


I 


I 


I 


I 


I 


I 


I 


I 


I 


I 


L 


I 


F 


19 


I 


I 


1 


L 


I 


I 


I 


I 


I 


I 


I 


P 


P 


R 


R 


R 


P 


R 


P 


20 


R 


R 


R 


R 


R 


R 


R 


R 


R 


R 


R 


A 


S 


S 


S 


R 


R 


Q 


S 


21 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


F 


F 


F 
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I 


Y 


Y 


F 
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N 


N 


N 


N 
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A 


A 


A 


S 


A 


A 


A 


A 


A 


A 


A 


Q 


W 


L 


R 


L 


P 


S 


W 


26 


K 


K 
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T 


K 


K 


K 


K 


K 


K 


K 


K 
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A 
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A 


K 


K 


27 


A 


A 


A 


S 


A 


A 


A 


A 


A 


A 


A 


K 


A 


A 


A 


S 


S 


S 


A 


28 
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G 
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N 


G 


G 


G 


G 


G 


G 


G 


K 


K 


Q 


Q 


N 


R 


G 


K 


29 
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L 


L 


A 


F 


L 


L 


L 


L 


L 


L 


Q 


Q 


Q 


Q 


K 


M 


G 


Q 


30 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


c 


C 


C 


C 


C 


C 


c 


31 
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Q 


Q 


E 


E 


Q 


Q 


Q 


Q 


Q 


Q 


E 


L 


L 


Li 


K 


E 


Q 


L 


32 


T 


T 


T 


P 


T 


T 


T 


T 


T 


T 


T 


G 


P 


Q 


E 


V 


S 


Q 


P 


33 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


34 


V 


V 


V 


T 


V 


V 


V 


V 


V 


V 


V 


T 


D 


I 


I 


F 


I 


I 


N 


35 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


W 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


36 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


S 


S 


G 


G 


G 


G 


G 


S 


37 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 
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Table 13, continued 



R# 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 


14 


15 


16 


17 


18 


19 


38 


C 


T 


A 


C 


C 


C 


C 


C 


C 


c 


C 


C 


C 


C 


C 


C 


C 


C 


C 


39 
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R 


R 


Q 


R 


R 


R 


R 


R 


R 


R 


G 


G 


G 


G 


G 


K 


R 


G 


40 
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A 


A 


G 


A 


A 


A 


A 


A 


A 


A 


G 


G 


G 


G 


G 


G 


G 


G 


41 


K 


K 


K 


N 


K 


K 


K 


K 


K 


K 


K 


N 


N 


N 


N 


N 


N 


N 


N 


42 


R 


R 


R 


N 


S 


R 


R 


R 


R 


R 


R 


S 


A 


A 


A 


A 


K 


Q 


A 


43 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


44 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


R 


R 


R 


R 


N 


N 


R 


R 


45 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


46 


K 


K 


K 


E 


K 


K 


K 


K 


K 


K 


K 


K 


K 


K 


K 


E 


K 


D 


K 


47 


S 


S 


S 


T 


S 


S 


S 


S 


S 


S 


S 


T 


T 


T 


T 


T 


T 


T 


T 


48 


A 


A 


A 


T 


A 


A 


A 


A 


A 


A 


A 


I 


I 


I 


I 


R 


K 


T 


I 


49 


E 


E 


E 


E 


E 


E 


E 


E 


E 


E 


E 


E 


E 


D 


D 


D 


A 


Q 


E 


50 


D 


D 


D 


M 


D 


D 


D 


D 


D 


D 


D 


E 


E 


E 


E 


E 


E 


Q 


E 


51 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


52 


M 


M 


M 


L 


M 


M 


M 


M 


M 


M 


E 


R 


R 


R 


H 


R 


V 


Q 


R 


53 


R 


R 


R 


R 


R 


R 


R 


R 


R 


R 


R 


R 


R 


R 


R 


E 


R 


G 


R 


54 


T 


T 


T 


I 


T 


T 


T 


T 


T 


T 


T 


T 


T 


T 


T 


T 


A 


V 


T 


55 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


56 


G 


G 


G 


E 


G 


G 


G 


G 


G 


G 


G 


I 


V 


V 


V 


G 


R 


V 


V 


57 


G 


G 


G 


P 


G 


G 


G 


G 


G 


G 


G 


R 


G 


G 


G 


G 


P 




G 


58 


A 


A 


A 


P 


A 


A 


A 


A 


A 


A 


A 


K 








K 


P 






59 








Q 


























E 






60 








Q 


























R 






61 








T 


























P 






62 








D 
































63 








K 
































64 








S 

































10 



15 



1 BPTI (SEQ ID NO: 44) 

2 Engineered BPTI From MARK87 (SEQ ID NO 

3 Engineered BPTI From MARK87 (SEQ ID NO 

4 Bovine Colostrum (DUFT85) (SEQ ID NO 

5 Bovine Serum (DUFT85) (SEQ ID NO 

6 Semisynthetic BPTI, TSCH87 (SEQ ID NO 

7 Semisynthetic BPTI, TSCH87 (SEQ ID NO 

8 Semisynthetic BPTI, TSCH87 (SEQ ID NO 

9 Semisynthetic BPTI, TSCH87 (SEQ ID NO 

10 emisynthetic BPTI, TSCH87 (SEQ ID NO 

11 Engineered BPTI, AUER87 (SEQ ID NO 
12 




2 71) 
272), 
273) 



2 74) 
215) 



Engineered BPTI, AUER8 7 

Dendroaspis polylepis polylepis (Black mamba) venom 
I (DUFT8 5) (SEQ ID NO : S» JJT-fiJ 
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Table 13, continued 

13 Dendroaspis polylepis polylepis (Black Mamba) venom 
K (DUFT85) (SEQ ID N0:5^277) 

5 14 Hemachatus hemachates (Ringhals™ Cobra) HHV II 

(DUFT85) (SEQ ID NO:S^ 2 7jj) 

15 Naja nivea (Cape cobra) NNV II (DUFT85) (SEQ ID NOy&fr 

279) 

* ammm ' 16 Vipera russelli (Russel's viper) RW II (TAKA74) 
10 (SEQ ID NO z&&»2 180 ) 

17 Red sea turtle egg wKTte (DUFT85) (SEQ ID NO : 2 81) 

18 Snail mucus ( Helix pomania ) (WAGN78) (SEQ.IDT*^ NO:^ 

282) 

19 Dendroaspis angusticeps (Eastern green mamba) 
15 C13 SI C3 toxin (DUFT85) (SEQ ID NO 283), 
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Table 13: BPTI Homologues (continued) 



R # 


20 


21 


22 


23 


24 


25 


26 


27 


28 


29 


30 


31 


32 


33 


34 


35 


-5 




























D 






-4 


_ 


_ 


_ 


_ 


_ 


_ 


_ 


_ 


_ 


_ 


_ 


_ 


_. 


E 


_ 


_ 


-3 


_ 


_ 


_ 






_ 


_ 


_ 


_ 


_ 


_ 




T 


P 






-2 


Z 


_ 


L 


Z 


R 


K 


_ 


_ 


_ 


R 


R 




E 


T 


_ 




-1 


P 




Q 


D 


D 


N 








Q 


K 




R 


T 






1 


R 


R 


H 


H 


R 


R 


I 


K 


T 


R 


R 


R 


G 


D 


K 


T 


2 


R 


P 


R 


P 


P 


P 


N 


E 


V 


H 


H 


P 


F 


L 


A 


V 


3 


K 


Y 


T 


K 


K 


T 


G 


D 


A 


R 


P 


D 


L 


P 


D 


E 


4 


L 


A 


F 


F 


F 


F 


D 


S 


A 


D 


D 


F 


D 


I 


S 


A 


5 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


6 


I 


E 


K 


Y 


Y 


N 


E 


Q 


N 


D 


D 


L 


T 


E 


Q 


N 


7 


L 


L 


L 


L 


L 


L 


L 


L 


L 


K 


K 


E 


S 


Q 


L 


L 


8 


H 


I 


P 


P 


P 


L 


P 


G 


P 


P 


P 


P 


P 


A 


D 


P 


9 


R 


V 


A 


A 


A 


P 


K 


Y 


V 


P 


P 


P 


P 


FG 


Y 


I 


10 


N 


A 


E 


D 


D 


E 


V 


S 


I 


D 


D 


Y 


V 


D 


S 


V 


11 


P 


A 


P 


P 


P 


T 


V 


A 


R 


K 


T 


T 


T 


A 


Q 


Q 


12 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


K 


G 


G 


G 


G 


G 


13 


R 


P 


P 


R 


R 


R 


P 


P 


P 


N 


1 


P 


P 


L 


P 


P 


14 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


15 


Y 


M 


K 


K 


L 


N 


R 


M 


R 


_ 


_ 


K 


R 


F 


L 


R 


16 


D 
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A 
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A 


A 


G 


A 
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Q 


A 


A 


G 


G 


A 


17 


K 


F 
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H 
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L 


R 


M 


F 


P 


T 


K 


G 


Y 


L 


F 


18 


I 


I 


I 


I 


M 


I 


F 


T 


I 


V 


V 


M 


F 


M 


F 


I 


19 


P 


S 


P 


P 


P 


P 


P 


S 


Q 


R 


R 


I 


K 


K 


K 


Q 


20 


A 


A 


A 


R 


R 


A 


R 


R 


L 


A 


A 


R 


R 


L 


R 


L 


21 


F 


F 


F 


F 


F 


F 


Y 


Y 


W 


F 


F 


Y 


Y 


Y 


Y 


W 


22 
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K 
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C 


C 
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C 


C 
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C 


C 


C 
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C 


C 


C 


C 


C 


31 


E 


Y 
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N 


E 


Q 
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V 


K 


V 


E 


E 


E 


E 


V 


32 


R 


P 


L 


K 


K 


K 


K 


T 


Li 


A 


Q 


T 


P 


E 


T 


R 


33 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


34 


D 


T 


H 


I 


I 


N 


1 
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P 


Q 


R 


V 


K 


I 


L 


S 
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Table 13, continued 



R # 


20 


21 


22 


23 


24 


25 


26 


27 


28 


29 


30 


31 


32 


33 


34 


35 


35 


W 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


36 


S 


S 


G 


G 


G 


G 


G 


G 


G 


R 


G 


G 


G 


G 


G 


G 


37 


6 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


38 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


39 


G 


R 


K 


P 


R 


G 


G 


M 


Q 


D 


D 


K 


K 


Q 


M 


K 


40 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


A 


G 


G 


G 


G 


41 


N 


N 


N 


N 


N 


N 


N 


N 


N 


D 


D 


K 


N 


N 


N 


N 


42 


S 


A 


A 


A 


A 


A 


A 


G 


G 


H 


H 


S 


G 


D 


L 


G 


43 


N 


N 


N 


N 


N 


N 


N 


N 


N 


G 


G 


N 


N 


N 


N 


N 


44 


R 


R 
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N 


K 
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F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


Y 


F 


F 


F 


46 


K 


K 


s 


K 


K 


K 


H 


V 


Y 


K 


K 


R 


K 


S 


L 


Y 
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T 


T 


T 


T 


T 


T 


T 


T 


S 


T 


S 


S 


S 


T 


S 


S 


48 


I 


I 


I 


W 


W 


I 


Li 


E 


E 


E 


D 


A 


E 


L 


Q 


Q 


49 


E 


E 


E 


D 


D 


D 


E 


K 


K 


T 


H 


E 


Q 


A 


K 


K 


50 


E 


E 


K 


E 


E 


E 


E 


E 


E 


L 


L 


D 


D 


E 


E 


E 


51 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


52 


R 


R 


R 


R 


R 


o 


E 


L 


R 


R 


R 


M 


L 


E 


L 


K 


53 


R 


R 


H 


Q 


H 


R 


K 


Q 


E 


C 


C 


R 


D 


Q 


Q 


E 


54 


T 


T 


A 


T 


T 


T 


V 


T 


Y 


E 


E 


T 


A 


K 


T 


Y 


55 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


c 


C 


56 


I 


V 


V 


G 


V 


A 


G 


R 


G 


L 


E 


G 


s 


I 


R 


G 


57 


G 


V 


G 


A 


A 


A 


V 




V 


V 


L 


G 


G 


N 




I 


58 








S 


S 


K 


R 




P 


Y 


Y 


A 


F 






P 


59 








A 


G 


Y 


S 




G 


P 


R 










G 


60 










I 


G 






D 














E 


61 


















E 














A 
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2 0 Dendroaspis angusticeps (Eastern Green Mamba) 
C13 S2 C3 toxin (DUFT85) (SEQ ID N0:« 2 84) 

21 Dendroaspis polylepis polylepis (Black mamba) B toxin 
5 (DUFT85) (SEQ ID NO 285 .) 

22 Dendroaspis polylepis polylepis (Black Mamba) E toxin 
(DUFT85) (SEQ ID NO :<&& 286]> 

2 3 Vipera ammodytes TI toxin (DUFT85) (SEQ ID NO:6£r 
2 8 7,) 

10 24 Vipera ammodytes CTI toxin (DUFT85) (SEQ ID NO 

2 88), 

25 Bungarus fasciatus VIII B toxin (DUFT85) (SEQ ID NO:£As> 

289) 

26 Anemonia sulcata (sea anemone) 5 II (DUFT85) 
15 (SEQ ID NO :tJMb T2 9 0) 

2 7 Homo sapiens HI -14 "inactive" domain (DUFT85) 

(SEQ ID NOaaWR 2 91) 
2 8 Homo sapiens HT^T "active" domain (DUFT85) (SEQ ID NO:^ 



2 0 2 9 beta bungarotoxin Bl (DUFT85) (SEQ ID NO ^fe 293) 

3 0 beta bungarotoxin B2 (DUFT85) (SEQ ID NO?^|^ 

31 Bovine spleen TI II (FIOR85) (SEQ ID NO :^A^ 29>5 j "^ 

32 Tachypleus trident atus (Horseshoe crab) hemocyte 
inhibitor (NAKA87) (SEQ ID NO :*& ^2 96) 

2 5 3 3 Bombyx mori (silkworm) SCI -III (SASA84 ) (SEQ ID NO 

34 Bos taurus (inactive) BI-14(SEQ ID NO i^a» 29B^ 

35 Bos taurus (active) BI-8(SEQ ID NO :«¥»299) 
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Table 13, continued 



R # 


36 


3 7 


38 


39 


40 


-5 












-4 












-3 












-2 












-1 




z 








1 


R 


R 


R 


R 


R 


2 


p 


p 


p 


p 


p 


3 


D 


D 


D 


D 


D 


4 


F 


F 


F 


F 


F 


5 


C 


C 


C 


C 


C 


6 


L 


L 


L 


L 


L 


7 


E 


E 


E 


E 


E 


8 


P 


P 


P 


P 


P 


9 


P 


P 


P 


P 


P 


10 


Y 


Y 


Y 


Y 


Y 


11 


T 


T 


T 


T 


T 


12 


G 


G 


G 


G 


G 


13 


P 


P 


P 


P 


P 


14 


C 


C 


C 


C 


C 


15 


R 


K 


K 


K 


K 


16 


A 


A 


A 


A 


A 


17 


R 


R 


R 


R 


K 


18 


I 


M 


I 


M 


M 


19 


I 


I 


I 


I 


I 


20 


R 


R 


R 


R 


R 


21 


Y 


Y 


Y 


Y 


Y 


22 


F 


F 


F 


F 


F 


23 


Y 


Y 


Y 


Y 


Y 


24 


N 


N 


N 


N 


N 


25 


A 


A 


A 


A 


A 


26 


K 


K 


K 


K 


K 


27 


A 


A 


A 


A 


A 


28 


G 


G 


G 


G 


G 


29 


L 


L 


Li 


L 


F 


30 


C 


C 


c 


C 


C 


31 


Q 


Q 


Q 


Q 


E 


32 


T 


P 


P 


P 


T 


33 


F 


F 


F 


F 


F 


34 


V 


V 


V 


V 


V 


35 


Y 


Y 


Y 


Y 


Y 


36 


G 


G 


G 


G 


G 
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Table 13, continued 



37 


G 


G 


G 


G 


G 


38 


C 


C 


C 


C 


C 


39 


R 


R 


R 


R 


K 


40 


A 


A 


A 


A 


A 


41 


K 


K 


K 


K 


K 


42 


R 


S 


R 


R 


S 


43 


N 


N 


N 


N 


N 


44 


N 


N 


N 


N 


N 


45 


F 


F 


F 


F 


F 


46 


K 


K 


K 


K 


R 


47 


S 


S 


S 


S 


S 


48 


A 


A 


S 


A 


A 


49 


E 


E 


E 


E 


E 


50 


D 


D 


D 


D 


D 


51 


C 


C 


C 


C 


C 


52 


E 


M 


M 


M 


M 


53 


R 


R 


R 


R 


R 


54 


T 


T 


T 


T 


T 


55 


C 


C 


C 


C 


C 


56 


G 


G 


G 


G 


G 


57 


G 


G 


G 


G 


G 


58 


A 


A 


A 


A 


A 


59 












60 












61 













36: Engineered BPTI (KR15, ME52) : Auerswald '88, Biol Chem 
5 Hoppe-Seyler, 369 Supplement, pp27- 35. (SEQ ID NO 

37: Isoaprotinin G-l: Siekmann, Wenzel, Schroder, and 
Tschesche '88, Biol Chem Hoppe-Seyler , 369 : 157-163 . 
(SEQ ID NO:%» 3 0 1 ) 
10 38: Isoaprotinin 2: Siekmann, Wenzel, Schroder, and 

Tschesche '88, Biol Chem Hoppe-Seyler, 369 : 157-163 . 
(SEQ ID NO:«4 302) 
39: Isoaprotinin G-"2 : Siekmann, Wenzel, Schroder, and 
Tschesche '88, Biol Chem Hoppe-Seyler, 369 : 157-163 . 
15 (SEQ ID NQ:#» 3 0 3 ( ) 

40: Isoaprotinin 1: Siekmann, Wenzel, Schroder, and 

Tschesche '88, Biol Chem Hoppe-Seyler, 369 : 157-163 . 
(SEQ ID NO:^y 3 04] 
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Table 13, continued 



Notes : 

a) both beta bungarotoxins have residue 15 deleted. 
5 b) B . mori has an extra residue between C5 and C14 ; 

we have assigned F and G to residue 9. 

c) all natural proteins have C at 5, 14, 30, 38, 50, 
& 55. 

d) all homologues have F33 and G37. 

10 e) extra C's in bungarotoxins form interchain 

cystine bridges 
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Identification codes for Tables 14 and 15 

1 BPTI 

2 synthetic BPTI, Tan & Kaiser, biochem. 16(8)1531-41 
5 3 Semisynthetic BPTI, TSCH87 

4 Semisynthetic BPTI, TSCH87 

5 Semisynthetic BPTI, TSCH87 

6 Semisynthetic BPTI, TSCH87 

7 Semisynthetic BPTI, TSCH87 
10 8 Engineered BPTI, AUER8 7 

9 BPTI Auerswald &al GB 2 208 511A 

10 BPTI Auerswald &al GB 2 2 08 511A 

11 Engineered BPTI From MARK87 

12 Engineered BPTI From MARK87 

15 13 BPTI (KR15,ME52) : Auerswald '88, Biol Chem Hoppe-Seyler, 369 
Suppl, pp27-3 5. 

14 BPTI CA3 0/CA51 Eigenbrot &al, Protein Engineering 
3 (7) 591-598 ( 1 90) 

15 Isoaprotinin 2 Siekmann et al ! 88, Biol Chem 
20 Hoppe-Seyler, 369:157-163. 

16 Isoaprotinin G-2 : Siekmann et al f 88, Biol Chem 
Hoppe-Seyler, 369 : 157-163 . 

17 BPTI Engineered, Auerswald &al GB 2 2 08 511A 

18 BPTI Engineered, Auerswald &al GB 2 208 511A 
25 19 BPTI Engineered, Auerswald &al GB 2 208 511A 

2 0 Isoaprotinin G-l Siekmann &al '88, Biol Chem 

Hoppe-Seyler, 369 : 157-163 . 

21 BPTI Engineered, Auerswald &al GB 2 2 08 511A 

22 BPTI Engineered, Auerswald &al GB 2 208 511A 
30 23 Bovine Serum (in Dufton '85) 

24 Bovine spleen TI II (FIOR85) 

25 Snail mucus (Helix pomatia) (WAGN78) 

2 6 Hemachatus hemachates (Ringhals Cobra) HHV II (in Dufton 
• 85) 

35 27 Red sea turtle egg white (in Dufton '85) 

28 Bovine Colostrum (in Dufton 1 85) 

29 Naja nivea (Cape cobra) NNV II (in Dufton '85) 

30 Bungarus fasciatus VIII B toxin (in Dufton '85) 

31 Vipera ammodytes TI toxin (in Dufton '85) 
40 32 Porcine ITI domain 1, (in CREI87) 

33 Human Alzheimer's beta APP protease inhibitor, (SHIN90) 

34 Equine ITI domain 1, in Creighton & Charles 

35 Bos taurus (inactive) BI-8e (ITI domain 1) 

36 Anemonia sulcata (sea anemone) 5 II (in Dufton '85) 
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Identification codes for Tables 14 and 15 

37 Dendroaspis polylepis polylepes (Black Mamba) E toxin (in 
Dufton ! 85) 

5 38 Vipera russelli (Russel 1 s viper) RW II (TAKA74) 

3 9 Tachypleus tridentatus (Horseshoe crab) hemocyte 
inhibitor (NAKA87) 

40 LACI 2 (Factor Xa) (WUNT88) 

41 Vipera ammodytes CTI toxin (in Dufton '85) 

10 42 Dendroaspis polylepis polylepis (Black Mamba) venom K (in 
Dufton '85) 

43 Homo sapiens HI-8e "inactive" domain (in Dufton f 85) 

44 Green Mamba toxin K, (in CREI87) 

45 Dendroaspis angusticeps (Eastern green mamba) C13 SI C3 
15 toxin (in Dufton '85) 

46 LACI 3 

47 Equine ITI domain 2, (CREI87) 

48 LACI 1 (Vila) 

49 Dendroaspis polylepis polylepes (Black mamba) B toxin (in 
20 Dufton 1 85) 

50 Porcine ITI domain 2, Creighton and Charles 

51 Homo sapiens HI-8t "active" domain (in Dufton f 85) 

52 Bos taurus (active) BI-8t 

53 Trypstatin Kito &al ('88) J Biol Chem 263(34) 18104-07 

25 54 Dendroaspis angusticeps (Eastern Green Mamba) C13 S2 C3 
toxin (in Dufton '85) 

55 Green Mamba I venom Creighton & Charles ! 87 CSHSQB 
52 :511-519. 

56 beta bungarotoxin B2 (in Dufton '85) 

30 57 Dendroaspis polylepis polylepis (Black mamba) venom I (in 
Dufton "85) 

58 beta bungarotoxin Bl (in Dufton ' 85) 

59 Bombyx mori (silkworm) SCI -III (SASA84) 



365 









Table 


14 : 


Tally 


of 


Ionizable 


groups 






Identifier 


D 


E K 




R Y 


H 


NH 


C02 + 


ions 






1 


2 


2 


4 


6 


4 


0 


1 


1 


6 


16 




2 


2 


2 


4 


6 


4 


0 


1 


1 


6 


16 


5 


3 


2 


2 


3 


6 


4 


0 


1 


1 


5 


15 




4 


2 


2 


3 


6 


4 


0 


1 


1 


5 


15 




5 


2 


2 


3 


6 


4 


0 


1 


1 


5 


15 




6 


2 


2 


3 


6 


4 


0 


1 


1 


5 


15 




7 


2 


2 


3 


6 


4 


0 


1 


1 


5 


15 


10 


8 


2 


3 


4 


6 


4 


0 


1 


1 


5 


17 




9 


2 


2 


3 


5 


4 


0 


1 


1 


4 


14 




10 


2 


3 


3 


6 


4 


0 


1 


1 


4 


16 




11 


2 


2 


4 


6 


4 


0 


1 


1 


6 


16 




12 


2 


2 


4 


6 


4 


0 


1 


1 


6 


16 


15 


13 


2 


3 


3 


7 


4 


0 


1 


1 


5 


17 




14 


2 


2 


4 


6 


4 


0 


1 


1 


6 


16 




15 


2 


2 


4 


6 


4 


0 


1 


1 


6 


16 




16 


2 


2 


4 


6 


4 


0 


1 


1 


6 


16 




17 


2 


2 


3 


5 


4 


0 


1 


1 


4 


14 


20 


18 


2 


3 


3 


5 


4 


0 


1 


1 


3 


15 




19 


2 


3 


3 


5 


4 


0 


1 


1 


3 


15 




20 


2 


2 


4 


5 


4 


0 


1 


1 


5 


15 




21 


2 


3 


3 


4 


4 


0 


1 


1 


2 


14 




22 


2 


4 


3 


4 


4 


0 


1 


1 


1 


15 


25 


23 


2 


4 


4 


4 


4 


0 


1 


1 


2 


16 




24 


2 


3 


5 


4 


4 


0 


1 


1 


4 


16 




25 


1 


1 


2 


4 


4 


0 


1 


1 


4 


10 




26 


2 


3 


2 


5 


3 


1 


1 


1 


2 


14 




27 


2 


4 


6 


8 


3 


0 


1 


1 


8 


22 


30 


28 


2 


4 


2 


3 


3 


0 


1 


1 


-1 


13 




29 


1 


4 


2 


7 


2 


2 


1 


1 


4 


16 




30 


1 


2 


5 


3 


4 


2 


1 


1 


5 


13 




31 


4 


1 


5 


3 


4 


2 


1 


1 


3 


15 




32 


1 


4 


3 


2 


4 


1 


1 


1 


0 


12 


35 


33 


2 


6 


1 


5 


3 


0 


1 


1 


-2 


16 




34 


2 


4 


2 


2 


3 


1 


1 


1 


-2 


12 




35 


2 


2 


3 


2 


4 


0 


1 


1 


1 


11 




36 


1 


5 


4 


5 


4 


1 


1 


1 


3 


17 




37 


0 


2 


6 


3 


3 


3 


1 


1 


7 


13 


40 


38 


2 


5 


3 


7 


3 


2 


1 


1 


3 


19 




39 


3 


3 


5 


5 


4 


0 


1 


1 


4 


18 




40 


3 


7 


4 


3 


4 


0 


1 


1 


-3 


19 




41 


3 


2 


4 


6 


5 


1 


1 


1 


5 


17 




42 


1 


2 


8 


5 


4 


0 


1 


1 


10 


18 


45 


43 


1 


4 


2 


2 


4 


0 


1 


1 


-1 


11 




44 


1 


2 


9 


4 


5 


0 


1 


1 


10 


18 
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10 



15 



tifier D 


E 


K 


R 


Y 


H 


NH 


C02 


+ 


ions 


45 


0 


2 


8 


4 


5 


0 


1 


1 


10 


16 


46 


1 


3 


5 


5 


3 


0 


1 


1 


6 


16 


47 


3 


4 


4 


3 


3 


0 


1 


1 


0 


16 


48 


3 


6 


5 


4 


1 


1 


1 


1 


0 


20 


49 


0 


3 


3 


5 


5 


0 


1 


1 


5 


13 


50 


2 


6 


4 


2 


3 


0 


1 


1 


-2 


16 


51 


2 


4 


4 


*3 


-J 


n 


i 

X 


_L 


i 

X 




52 


1 


4 


6 


2 


3 


0 


1 


1 


3 


15 


53 


2 


2 


5 


1 


4 


0 


1 


1 


2 


12 


54 


2 


3 


6 


8 


3 


1 


1 


1 


9 


21 


55 


1 


3 


6 


7 


3 


1 


1 


1 


9 


19 


56 


6 


2 


6 


7 


4 


3 


1 


1 


5 


23 


57 


0 


3 


7 


7 


3 


1 


1 


1 


11 


19 


58 


6 


2 


5 


7 


4 


2 


1 


1 


4 


22 


59 


4 


7 


3 


1 


4 


0 


1 


1 


-7 


17 
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Table 15: Frequency of Amino Acids at Each Position 
in BPTI and 58 Homologues 



Res . 
Id. 


Different 
AAs 


Contents 






First 


-5 


2 


-58 


D 










-4 


2 


-58 


E 








_ 


-3 


5 


-55 


P T Z F 








- 


-2 


10 


-43 


R3 Z3 Q3 


T2 E G H K L 






-1 


11 


-41 


D4 P3 R2 


T2 Q2 


G K N Z E 




_ 


1 


13 


R3 5 


K6 T4 A3 


H2 G2 


L M N P I D - 




R 


2 


10 


P3 5 


R6 A4 V4 


H3 E3 


N F I L 




P 


3 


11 


D32 


K8 S4 A3 


T3 R2 


E2 P2 G L Y 




D 


4 


9 


F34 


A6 D4 L4 


S4 Y3 


12 W V 




F 


5 


1 


C59 










C 


6 


13 


L2 5 


N7 E6 K4 


Q4 13 


D2 S2 Y2 R F T 


A 


L 


7 


7 


L2 8 


E25 K2 F 


Q S T 






E 


8 


10 


P4 6 


H3 D2 G2 


E I K 


L A Q 




P 


9 


12 


P3 0 


A9 14 V4 


R3 Y3 


L F Q H E K 




P 


9a 


2 


-58 


G 








- 


10 


9 


Y24 


E8 D8 V6 


R3 S3 


A3 N3 I 




Y 


11 


11 


T31 


Q8 P7 R3 


A3 Y2 


K S D V I 




T 


12 


2 


G58 


K 








G 


13 


5 


P45 


R7 L4 12 


N 






P 


14 


3 


C57 


A T 








C 


15 


12 


K2 2 


R12 L7 V6 Y3 M2 -2 N I A F G 




K 


16 


7 


A41 


G9 F2 D2 


K2 Q2 


R 




A 


17 


14 


R19 


L8 K7 F5 


M4 Y4 


H2 A2 S2 G2 I 


N T P 


R 


18 


8 


141 


M7 F4 L2 


V2 E ' 


T A 




I 


19 


10 


124 


P12 R8 K5 S4 Q2 L N E T 




I 


20 


5 


R3 9 - 


A8 L6 S5 


Q 






R 


21 


5 


Y3 5 


F17 W5 I 


L 






Y 


22 


6 


F3 2 


Y18 A5 H2 S N 






F 


23 


2 


Y52 


F7 








Y 


24 


4 


N47 


D8 K3 S 








N 


25 


13 


A2 9 


S6 Q4 G4 


W4 P3 


T2 L2 R N K V 


I 


A 


26 


11 


K31 


A9 T5 S3 


V3 R2 


E2 G H F Q 




K 


27 


8 


A3 2 


Sll K5 T4 Q3 L2 I E 




A 


28 


7 


G3 2 


K13 N5 M4 Q2 R2 H 




G 


29 


10 


L22 


K13 Qll . 


A5 F2 


R2 N G M T 




L 


30 


2 


C58 


A 








C 


31 


10 


Q25 


E17 L5 V5 K2 N 


A R I Y 




Q 


32 


11 


T25 


Pll K4 Q4 L4 R3 E3 G2 S A V 




t 


33 


1 


F59 










F 


34 


13 


V24 


110 T5 N3 Q3 D3 K3 F2 H2 R S 


P L 


V 
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Table 15: Frequency of Amino Acids at Each Position 
in BPTI and 58 Homologues (continued) 



Res . 
Id. 


Different 
AAs 


Contents 




First 


35 


2 


Y56 


W3 




Y 


36 


3 


G50 


S8 R 




G 


37 


1 


G59 






G 


38 


3 


C57 


A T 




c 


39 


9 


R2 5 


G13 K6 Q4 E3 M3 L2 


D2 P 


R 


40 


2 


G3 5 


A24 




A 


41 


3 


N33 


K24 D2 




K 


42 


12 


R22 


A12 G8 S6 Q2 H2 N2 


M D E K L 


R 


43 


2 


N5 7 


G2 




N 


44 


3 


N4 0 


R14 K5 




N 


45 


2 


F58 


Y 




F 


46 


11 


K3 9 


Y5 E4 S2 V2 D2 R H 


T A L 


K 


47 


2 


S36 


T23 




S 


48 


11 


A2 3 


111 E6 Q6 L4 K2 T2 


W2 SDR 


A 


49 


8 


E37 


K8 D6 Q3 A2 P H T 




E 


50 


7 


E27 


D25 K2 L2 M Q Y 




D 


51 


2 


C58 


A 




C 


52 


9 


M17 


R15 E8 L7 K6 Q2 T2 


H V 


M 


53 


11 


R3 7 


E6 Q5 K2 C2 H2 A N 


G D W 




54 


8 


T41 


Y5 A4 V3 12 E2 M K 






55 


1 


C59 






C 


56 


10 


G33 


V9 R5 14 E3 L A S 


T K 


G 


57 


12 


G34 


V6 -5 A3 R2 12 P2 


D K S L NG 


G 


58 


10 


A2 5 


-15 P7 K3 S2 Y2 G2 


F D RA 


A 
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Table 16: Exposure in BPTI 
Coordinates taken from 

Brookhaven Protein Data Bank entry 6PTI . 

HEADER PROTEINASE INHIBITOR (TRYPSIN) 13 -MAY- 8 7 6PTI 

COMPND BOVINE PANCREATIC TRYPSIN INHIBITOR (SEQ ID NO:44) 

COMPND 2 (/BPTI $, CRYSTAL FORM /I 11$) 
AUTHOR A . WLODAWER 

Solvent radius = 1.4 0 
Atomic radii given in Table 7 



Areas in A 2 



15 



Residue 



Total 
area 



Not 

Covered 
by M/C 



fraction 



Not 

covered 

at all fraction 



ARG 


1 


342 


.45 


205 


. 09 


0 


. 5989 


152 .49 


0 


.4453 


PRO 


2 


239 


.12 


92 


.65 


0 


.3875 


47.56 


0 


.1989 


ASP 


3 


272 


.39 


158 


.77 


0 


. 5829 


143 .23 


0 


.5258 


PHE 


4 


311 


.33 


137 


.82 


0 


.4427 


43 .21 


0 


.1388 


CYS 


5 


241 


. 06 


48 


.36 


0 


.2006 


0 .23 


0 


. 0010 


LEU 


6 


280 


.98 


151 


.45 


0 


. 5390 


115.87 


0 


.4124 


GLU 


7 


291 


.39 


128 


. 91 


0 


.4424 


90.39 


0 


.3102 


PRO 


9 


236 


. 12 


128 


.71 


0 


. 5451 


99.98 


0 


.4234 


PRO 


9 


236 


. 09 


109 


.82 


0 


.4652 


45.80 


0 


.1940 


TYR 


10 


330 


.97 


153 


.63 


0 


.4642 


79.49 


0 


.2402 


THR 


11 


249 


.20 


80 


. 10 


0 


. 3214 


64 . 99 


0 


.2608 


GLY 


12 


184 


.21 


56 


. 75 


0 


.3081 


23 . 05 


0 


.1252 


PRO 


13 


240 


. 07 


130 


.25 


0 


.5426 


75.27 


0 


.3136 


CYS 


14 


237 


. 10 


75 


. 55 


0 


.3186 


53 . 52 


0 


.2257 


LYS 


15 


310 


. 77 


200 


.25 


0 


. 6444 


192 . 00 


0 


.6178 


ALA 


16 


209 


.41 


66 


.63 


0 


.3182 


45.59 


0 


.2177 


ARG 


17 


351 


. 09 


243 


.67 


0 


. 6940 


201.48 


0 


.5739 


ILE 


18 


277 


. 10 


100 


.51 


0 


.3627 


58 . 95 


0 


.2127 


ILE 


19 


278 


. 03 


146 


. 06 


0 


. 5254 


96 . 05 


0 


.3455 


ARG 


20 


339 


. 11 


144 


. 65 


0 


.4266 


43 .81 


0 


. 1292 


TYR 


21 


333 


. 60 


102 


.24 


0 


.3065 


69 . 67 


0 


.2089 


PHE 


22 


306 


. 08 


70 


. 64 


0 


.2308 


23 . 01 


0 


.0752 


TYR 


23 


338 


.66 


77 


.05 


0 


.2275 


17.34 


0 


.0512 


ASN 


24 


264 


.88 


99 


.03 


0 


.3739 


38.69 


0 


.1461 


ALA 


25 


211 


. 15 


85 


. 13 


0 


.4032 


48.20 


0 


.2283 


LYS 


26 


313 


.29 


216 


. 14 


0 


.6899 


202 . 84 


0 


. 6474 
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Table 16, continued. 



ALA 


27 


210 


.66 


96 . 05 


0 


.4560 


54 


.78 


0 


.2601 


GLY 


28 


186 


. 83 


71 .52 


0 


.3828 


32 


. 09 


0 


.1718 


LEU 


29 


280 


.70 


132 .42 


0 


.4718 


93 


.61 


0 


.3335 


CYS 


30 


238 


.15 


57 .27 


0 


.2405 


19 


.33 


0 


.0812 


GLN 


31 


301 


. 15 


141.80 


0 


.4709 


82 


.64 


0 


.2744 


THR 


32 


251 


.26 


138.17 


0 


.5499 


76 


.47 


0 


.3043 


PHE 


33 


304 


.27 


59.79 


0 


.1965 


18 


.91 


0 


.0622 


VAL 


34 


251 


.56 


109.78 


0 


.4364 


42 


.36 


0 


. 1684 


TYR 


35 


332 


. 64 


80 .52 


0 


.2421 


15 


.05 


0 


. 0452 


GLY 


36 


187 


.06 


11.90 


0 


.0636 


1 


.97 


0 


.0105 


GLY 


37 


185 


.28 


84 .26 


0 


.4548 


39 


. 17 


0 


.2114 


CYS 


38 


234 


.56 


73 .64 


0 


.3139 


26 


.40 


0 


.1125 


ARG 


39 


417 


. 13 


304 .62 


0 


.7303 


250 


.73 


0 


.6011 


ALA 


40 


209 


.53 


94 .01 


0 


.4487 


52 


. 95 


0 


.2527 


LYS 


41 


314 


.60 


166 .23 


0 


.5284 


108 


.77 


0 


.3457 


ARG 


42 


349 


. 06 


232.83 


0 


.6670 


179 


.59 


0 


.5145 


ASN 


43 


266 


.47 


38 .53 


0 


. 1446 


5 


.32 


0 


. 0200 


ASN 


44 


269 


.65 


91 .08 


0 


.3378 


23 


.39 


0 


.0867 


PHE 


45 


313 


.22 


69.73 


0 


.2226 


14 


.79 


0 


. 0472 


LYS 


46 


309 


.83 


217 . 18 


0 


.7010 


155 


. 73 


0 


.5026 


SER 


47 


224 


.78 


69.11 


0 


.3075 


24 


.80 


0 


.1103 


ALA 


48 


211 


. 01 


82 .06 


0 


.3889 


31 


. 07 


0 


. 1473 


GLU 


49 


286 


. 62 


161.00 


0 


.5617 


100 


. 01 


0 


.3489 


ASP 


50 


299 


.53 


156 .42 


0 


.5222 


95 


. 96 


0 


.3204 


CYS 


51 


238 


. 68 


24.51 


0 


. 1027 


0 . 


00 


0 


. 0000 


MET 


52 


293 


.05 


89.48 


0 


.3054 


66 


.70 


0 


.2276 


ARG 


53 


356 


.20 


224.61 


0 


.6306 


189 


.75 


0 


.5327 


THR 


54 


251 


. 53 


116 .43 


0 


.4629 


51 


. 64 


0 


.2053 


CYS 


55 


240 


.40 


69 . 95 


0 


.2910 


0 


.00 


0 


.0000 


GLY 


56 


184 


.66 


60 .79 


0 


.3292 


32 


.78 


0 


.1775 


GLY 


57 


106 


.58 


49 .71 


0 


.4664 


38 


. 28 


0 


.3592 


ALA 


58 


no ; 


position given 


in Protein 


Data : 


Bank 







"Total area" is the area measured by a rolling sphere of radius 

5 1.4 A, where only the atoms within the residue are 

considered. This takes account of conformation. 

"Not covered is the area measured by a rolling sphere by M/C" 

of radius 1.4 A where all main-chain atoms are 
10 considered, fraction is the exposed area divided by 

the total area. Surface buried by main- chain atoms 
is more definitely covered than is surface covered 
by side group atoms. 

15 "Not covered is the area measured by a rolling sphere at all" of 

radius 1.4 A where all atoms of the protein are 
considered. 
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Table 17: Plasmids used in Detailed Example I 
Phage Contents 

LG1 M13mpl8 with Ava II/Aat II/Acc I/Rsr Il/Sau I 

adaptor 

pLG2 LG1 with amp R and ColEl of pBR322 cloned into 

Aat II/Acc I sites 
pLG3 pLG2 with Acc I site removed 

pLG4 pLG3 with first part of osp-pbd gene cloned 

into Rsr Il/Sau I sites, Avr Il/Asu II sites 
created 

pLG5 pLiG4 with second part of osp-pbd gene cloned 

into Avr Il/Asu II sites, BssH I site created 
pLG6 pLG5 with third part of osp-pbd gene cloned 

into Asu II/BssH I sites, Bbe I site created 
pLG7 pLG6 with last part of osp-pbd gene cloned 

into Bbe I / Asu II sites 
pLG8 pLG7 with disabled osp-pbd gene, same length 

DNA. 

pLG9 pLG7 mutated to display BPTI (V15 B pti) 

pLGlO pLG8 + tet R gene - amp R gene 

pLGll pLG9 + tet R gene - amp R gene 
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Table 18: Enzyme sites eliminated when 
M13mpl8 is cut by Ava il and Bsu3 6I 



Aha 1 1 

Fsp l 

EcoRI 

Smal 

Hind i II 

Hindll 



Narl 

Bgl l 

SacI 

BamHI 

AccI 



Gdill 

HgiE II 

KjDnl 

Xbal 

PstI 



Pvul 



Bsu36I 



Xmal 



Sail 



Sph I 



Table 19: Enzymes not cutting M13mpl8 



Aatll 

Bbv II 

BstB I 

Eco57I 

Espl 

Nhel 

PflMI 

RsrI 

S2el 

Xcal 



Af I I 

Bel l 

BstE II 

EcoNI 

Hpal 

Not I 

PmaCI 

Sac I 

StuI 

Xhol 



Apa l 

BspM I 

BstX I 

EcoO109I 

Mlul 

Nru l 

Ppa l 

Sea l 

Styl 



Avrll 

BssHI 

EagI 

EcoRV 

Ncol 

Nsil 

PpuM I 

Sfil 

Tthllll 
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Table 20: Enzymes cutting Amp R gene and ori 

Aatll BbvII Eco57I Ppa l 

Sea l Tthlll l Aha I I Gdill 

Pvul Fsp l Bgll HgiE II 

Hind ll PstI Xba l Afllll 

Ndel 
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Table 21: Enzymes tested on Ambig DNA 



Enzyme 


Recognition 


Symm 


cuts 




Supply 


%AccI 


GTMKAC 


P 


2 


Sc 


4 


<B,M, I,N, P, T 


Af III 


CTTAAG 


P 


1 


Sc 


5 


<N 


Apal 


GGGCCC 


P 


5 


Sc 


1 


<M, I,N,P,T 


AsuII 


TTCGAA 


P 


2 


Sc 


4 


<P,N (BstBI) 


Aval I I 


ATGCAT 


P 


5 


Sc 


1 


<T; Nsil :M,N,P,T; 












EcoT22I :T 


Avrll 


CCTAGG 


P 


1 


Sc 


5 


<N 


BamHI 


GGATCC 


P 


1 


Sc 


5 


<S,B,M, I,N, P,T 


Bell 


TGATCA 


P 


1 


Sc 


5 


<S,B,M, I,N,T 


BspMII 


TCCGGA 


P 


1 


Sc 


5 


<N 


BssHII 


GCGCGC 


P 


1 


Sc 


5 


<N, T 


+BstEII 


GGTNACC 


P 


1 


Sc 


6 


<S / B / M / N / T 


%BstXI 


CCANNNNN 


P 


8 


Sc 


4 


<N, P,T 


+DraII 


RGGNCCY 


P 


2 


Sc 


5 


<M,T ; EcoO109I:N 


+EcoNI 


CCTNNNNN 


P 


5 


Sc 


6 


<N (soon) 


EcoRI 


GAATTC 


P 


1 


Sc 


5 


<S,B,M,I,N,P,T 


EcoRV 


GATATC 


P 


3 


Sc 


3 


<S,B,M, I,N,P,T 


+EspI 


GCTNAGC 


P 


2 


Sc 


5 


<T 


Hindi I I 


AAGCTT 


P 


1 


Sc 


5 


<S,B,M, I,N,P,T 


Hpal 


GTTAAC 


P 


3 


Sc 


3 


<S,B,M,I,N,P ,T 


Kpnl 


GGTACC 


P 


5 


Sc 


1 


<S,B,M,I,N,P,T 












Asp718 :M 


Mlul 


ACGCGT 


P 


1 


Sc 


5 


<M,N / P / T 


Narl 


GGCGCC 


P 


2 


Sc 


4 


<B,N,T 


Ncol 


CCATGG 


P 


1 


Sc 


5 


<B,M,N, P, T 


Nhel 


GCTAGC 


P 


1 


Sc 


5 


<M / N / P,T 


Not I 


GCGGCCGC 


P 


2 


Sc 


6 


<M / N / P, T 


Nrul 


TCGCGA 


P 


3 


Sc 


3 


<B,M,N, T 


+Pf 1MI 


CCANNNNN 


P 


7 


Sc 


4 


<N 


PmaCI 


CACGTG 


P 


3 


Sc 


3 


<none 


+PpuMI 


RGGWCCY 


P 


2 


Sc 


5 


<N 


+RsrII 


CGGWCCG 


P 


2 


Sc 


5 


<N,T 


SacI 


GAGCTC 


P 


5 


Sc 


1 


<B (SstI) ,M, I,N, P, T 


Sail 


GTCGAC 


P 


1 


Sc 


5 


<B,M, I # N f P # T 


+SauI 


CCTNAGG 


P 


2 


Sc 


5 


<M; CvnlrB; Mstll 












:T; Bsu36I:N; AocI:T 


+Sf il 


GGCCNNNNNGGCC 


P 


8 


Sc 


5 


<N, P, T (SEQ. ID. NO: 151) 


Smal 


CCCGGG 


P 


3 


Sc 


3 


<B,M, I,N, P,T 


Spel 


ACTAGT 


P 


1 


Sc 


5 


<M # N # T 


SphI 


GCATGC 


P 


5 


Sc 


1 


<B # M, I,N,P,T 


StuI 


AGGCCT 


P 


3 


Sc 


3 


<M,N, I (AatI) , P,T 


%StyI 


CCWWGG 


P 


1 


Sc 


5 


<N, P,T 
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TABLE 21, continued. 

Xcal GTATAC P 3 & 3 <N(soon) 

Xhol CTCGAG P 1 & 5 <B , M, I , P , T ; Ccrl 

T ; PaeR7I:N 

Xmal CCCGGG P 1 & 5 <I,N,P # T 

Xmalll CGGCCG P 1 & Eco52 I : T 

N restrct = 43 
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Table 22: ipbd gene (SEQ ID NO: 152) 



pbd modlO 29III88 : 

lacUVS Rsr I I / Avr I I /gene/ TrpA attenuator/ Mst ll; ! 

5 ' - 



CGGaCCG TaT 



CCAGGC tttaca CTTTATGCTTCCGGCTCG tataat GTG 



agtcta agcccgc ctaatga gcgggct tttttttt 
CCTgAGG -3' ! Mst ll 





TGG 


aATTGTGAGCGGATAACAATT 








lacO 




CCT 


AGGAgg CtcaCT 












Shin 




atg 


aag 


aaa 


tct 


ctg 


gtt 


ctt 


aag 


get 


age . 


10, 


10 


gtt 


get 


gtc 


gcg 


acc 


ctg 


gta 


ccg 


atg 


ctg , 


20 




tct 


ttt 


get 


cgt 


ccg 


gat 


ttc 


tgt 


etc 


gag . 


30 




ccg 


cca 


tat 


act 


ggg 


ccc 


tgc 


aaa 


gcg 


cgc , 


40 




ate 


ate 


cgt 


tat 


ttc 


tac 


aac 


get 


aaa 


gca 


50 




ggc 


ctg 


tgc 


cag 


acc 


ttt 


gta 


tac 


ggt 


ggt 


. 60 


15 


tgc 


cgt 


get 


aag 


cgt 


aac 


aac 


ttt 


aaa 


teg 


. 70 




gec 


gaa 


gat 


tgc 


atg 


cgt 


acc 


tgc 


ggt 


ggc 


. 80 




gec 


get 


gaa 


ggt 


gat 


gat 


ccg 


gec 


aaa 


gcg 


. 90 




gec 


ttt 


aac 


tct 


ctg 


caa 


get 


tct 


get 


acc 


. 100 




gaa 


tat 


ate 


ggt 


tac 


gcg 


tgg 


gec 


atg 


gtg 


. 110 


20 


gtg 


gtt 


ate 


gtt 


ggt 


get 


acc 


ate 


ggt 


ate 


! 120 




aaa 


ctg 


ttt 


aag 


aaa 


ttt 


act 


teg 


aaa 


gcg 


» 130 




tct 


taa 


tag 


tga 


ggttacc 


i 


BstEII 







Rsr I I site 
lacUVS 



M13 leader 



terminator 
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Table 23: ipbd DNA sequence (SEQ ID NO:152) 

DNA Sequence file = UV5_M13PTIM13 . DNA; 17 
DNA Sequence title = 
5 pbd modlO 2 91 118 8 : lac-UV5 RsrII/Avrll/gene/TrpA 

at t enuator/Mst II; ! 



1 


C 


GGA 


CCG 


TAT 


CCA 


GGC 


TTT 


ACA 


CTT 


TAT 


GCT 


TCC 


GGC 


TCG| 


41 


TAT 


AAT 


GTG 


TGG 


AAT 


TGT 


GAG 


CGG 


ATA 


ACA 


ATT 


CCT 


AGG 


AGG| 


83 


CTC 


ACT 


ATG 


AAG 


AAA 


TCT 


CTG 


GTT 


CTT 


AAG 


GCT 


AGC 


GTT 


GCT| 


125 


TC 


GCG 


ACC 


CTG 


GTA 


CCG 


ATG 


CTG 


TCT 


TTT 


GCT 


CGT 


CCG 


GAT | 


167 


TC 


TGT 


CTC 


GAG 


CCG 


CCA 


TAT 


ACT 


GGG 


CCC 


TGC 


AAA 


GCG 


CGC| 


209 


TC 


ATC 


CGT 


TAT 


TTC 


TAC 


AAC 


GCT 


AAA 


GCA 


GGC 


CTG 


TGC 


CAG| 


251 


CC 


TTT 


GTA 


TAC 


GGT 


GGT 


TGC 


CGT 


GCT 


AAG 


CGT 


AAC 


AAC 


TTT j 


293 


AA 


TCG 


GCC 


GAA 


GAT 


TGC 


ATG 


CGT 


ACC 


TGC 


GGT 


GGC 


GCC 


GCT| 


335 


AA 


GGT 


GAT 


GAT 


CCG 


GCC 


AAA 


GCG 


GCC 


TTT 


AAC 


TCT 


CTG 


CAA| 


377 


CT 


TCT 


GCT 


ACC 


GAA 


TAT 


ATC 


GGT 


TAC 


GCG 


TGG 


GCC 


ATG 


gtg| 


419 


TG 


GTT 


ATC 


GTT 


GGT 


GCT 


ACC 


ATC 


GGT 


ATC 


AAA 


CTG 


TTT 


AAG | 


461 


AA 


TTT 


ACT 


TCG 


AAA 


GCG 


TCT 


TAA 


TAG 


TGA 


GGT 


TAC 


CAG 


TCT| 


503 


AG 


CCC 


GCC 


TAA 


TGA 


GCG 


GGC 


TTT 


TTT 


TTT 


CCT 


GAG 


G 





20 

Total = 539 bases 



a* 



378 



Table 24: Summary of Restriction Cuts 





Enz 


= 


%Acc I has 1 observed sites : 


259 




Enz 




Acc III has 1 observed sites 


: 162 




Enz 


= 


Acy I has 1 observed sites : 


328 


5 


Enz 


= 


Afl II has 1 observed sites : 


109 




Enz 


= 


%Afl III has 1 observed sites 


: 404 




Enz 


- 


Aha III has 1 observed sites 


: 292 




Enz 


= 


Apa I has 1 observed sites : 


193 




Enz 


= 


Asp718 has 1 observed sites : 


138 


10 


Enz 


= 


Asu II has 1 observed sites : 


471 




Enz 


- 


%Ava I has 1 observed sites : 


175 




Enz 




Avr II has 1 observed sites : 


76 




Enz 


= 


%Ban I has 3 observed sites : 


138 328 540 




Enz 


- 


Bbe I has 1 observed sites : 


328 


15 


Enz 


= 


+Bgl I has 1 observed sites : 


352 




Enz 


— 


+Bin I has 1 observed sites : 


346 




Enz 


- 


%BspM I has 1 observed sites 


: 319 




Enz 


- 


BssH II has 1 observed sites 


: 205 




Enz 




+BstE II has 1 observed sites 


: 493 


20 


Enz 




%BstX I has 1 observed sites 


: 413 




Enz 


- 


Cfr I has 2 observed sites : 


299 350 




Enz 




+Dra II has 1 observed sites 


: 193 




Enz 




+Esp I has 1 observed sites : 


277 




Enz 


— 


%Fok I has 1 observed sites : 


213 


25 


Enz 


= 


Gdi II has 2 observed sites : 


299 350 




Enz 


- 


Hae I has 1 observed sites : 


240 




Enz 


= 


Hae II has 1 observed sites : 


328 




Enz 




+Hga I has 1 observed sites : 


478 




Enz 


= 


%HgiC I has 3 observed sites 


: 138 328 540 


30 


Enz 


= 


%HgiJ II has 1 observed sites : 193 




Enz 


— 


Hind III has 1 observed sites : 377 




Enz 


= 


+Hph I has 1 observed sites : 


340 




Enz 


— 


Kpn I has 1 observed sites : 


138 




Enz 


= 


+Mbo II has 2 observed sites 


: 93 304 


35 


Enz 


= 


Mlu I has 1 observed sites : 


404 




Enz 




Nar I has 1 observed sites : 


328 




Enz 




Nco I has 1 observed sites : 


413 




Enz 




Nhe I has 1 observed sites : 


115 




Enz 




Nru I has 1 observed sites : 


128 


40 


Enz 




Nsp(7524) has 1 observed sites : 311 




Enz 




NspB II has 1 observed sites 


: 332 




Enz 




+PflM I has 1 observed sites 


: 184 




Enz 




+Pss I has 1 observed sites : 


: 193 




Enz 




+Rsr II has 1 observed sites 




45 


Enz 




+Sau I has 1 observed sites : 


; 535 
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Table 24: Summary of Restriction 


Cuts 




Enz = 


%SfaN 


I has 


2 observed sites 


: 144 209 






Enz = 


+ Sfi I 


has 


1 observed sites 


: 351 






Enz = 


Sph I 


has 1 


observed sites : 


311 






Enz = 


Stu I 


has 1 


observed sites : 


240 






Enz = 


%Sty I 


has 


2 observed sites 


: 76 413 






Enz = 


Xca I 


has 1 


observed sites : 


259 






Enz = 


Xho I 


has 1 


observed sites : 


175 






Enz = 


Xma III has 


1 observed sites 


: 299 






Enzymes that 


do not 


cut 








Aat II 


AlwN I 


ApaL I 


Ase I 


Ava 


III 


Bal I 




BamH I 


bDV X 


nk-t r T T 

oJDV X _L 


Bel 


I 


Bgl II 


Bsm I 


BspH I 


Cla I 


Dra 


III 


Eco47 


III 


EcoN I 


EcoR I 


EcoR V 


HgiA I 


Hinc 


II 


Hpa I 


Mst I 


Nae I 


Nde 


I 


Not I 




Pie I 


PmaC I 


PpuM I 


Pst 


I 


Pvu I 




Pvu II 


Sac I 


Sac II 


Sal 


I 


Sea I 




Sma I 


SnaB I 


Spe I 


Ssp 


I 


Tag II 


Tthlll 


I Tthlll II 


Xho II 


Xma 


I 


Xmn I 
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Table 25: Annotated Sequence of ipbd gene (SEQ ID NO: 152) Protein 

Sequence SEQ ID NO: 153 



10 



5 1 - C | GGA | CCG | TAT | CCA | GGC | TTT | ACA | CTT | TAT | 
| Rsr II | | -35 | 

| GCT | TCC | GGC | TCG | TAT | AAT | GTG | TGG | 
52 

I -io I 



28 



40 



I ™ 

I 1 
I ATG 



I GTT 



s 

21 
TCT 



P 
31 
CCG 



AAT | TGT | GAG | CGG | ATA | ACA | ATT | 
lac operator | 



CCT | AGG | AGG | CTC | ACT | 
Avr II J_ 

D. I 



I s. 



k I k | s I 1 | v | 1 | k I a I s 
2|3|4|5|6|7|8|9|10 
AAG j AAA j TCT j CTG | GTT j CTT j AAG j GCT j AGC 

Afl II Nhe I 



a|v|a|t|l|v|p|m|l 
12 | 13 j 14 j 15| 16\ 17 j 18 j 19 j 20 
GCT | GTC | GCG | ACC | CTG | GTA j CCG j ATG j CTG 
| Nru I | 1 Kpn I [ 

f I a | r | p | d | f | c | 1 | e 

22 | 23 j 24 | 25| 26 | 27 \ 28 j 29| 30 
TTT j GCT j CGT j CCG j GAT j TTC | TGT j CTC j GAG 
| AccIII | j Ava I 



1 Xho I 



P I Y I t | g | p | c | k | a | r 

32 j 33 | 34 j 35 j 36 j 37 j 38 j 39 j 40 
CCA j TAT | ACT j GGG j CCC | TGC j AAA j GCG j CGC 
PflM I I |BssH II 

I I 



| Dra 


ii l 


Pss 


i 



73 



88 



118 



148 



178 



208 
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Table 25, continued 



| i | i | r | y | f | y | n 

5 j 4l| 42 j 43 | 44| 45 | 46 | 47 

j ATC | ATC j CGT j TAT | TTC | TAC j AAC 

|a|g|l|c|q|t|f 

| 50 | 51 | 52 | 53 | 54 | 55 | 56 
10 I GCA | GGC | CTG | TGC | CAG j ACC | TTT 

| Stu I | 



|c|r|a|k|r|n|n 
15 j 61 | 62 I 63 | 64| 65 j 66 j 67 
| TGC j CGT j GCT j AAG j CGT | AAC j AAC 
1 Esp I | 

|s|a|e|d|c|m|r 
20 | 70 j 71 j 72 | 73 j 74 j 75 j 76 
| TCG | GCC | GAA | GAT j TGC j ATG | CGT 
IXmalll | | Sph I | 

I 9 I a | a | e | g | d | d 
25 | 80 j 81 | 82 j 83 j 84 j 85 j 86 
j GGC j GCC j GCT | GAA | GGT j GAT | GAT 
1 Bbe I | 



a | k 
48 j 49 
GCT j AAA 

v j y 
57 j 58 
GTA j TAC 
Acc I 



Xca I 



f | k 
68 | 69 
TTT AAA 



t I c 
77 j 78 
ACC TGC 



9 I 9 I 
59 | 60 | 

GGT j GGT | 



235 



268 



295 



g 

79 
GGT 



325 



346 



| Nar I | 



30 



| p | a | k | a | a | 
| 87 j 88 | 89 j 90 | 91 | 
j CCG j GCC | AAA j GCG | GCC j 
I Sfi I I 



361 



35 



|f|n|s|l|q|a|s|a|t| 
| 92 | 93 | 94 | 95 | 96 j 97 j 98 j 99 j 100 j 
j TTT | AAC j TCT | CTG j CAA j GCT j TCT j GCT j ACC j 

[Hind 3 | 



388 



40 |e|y|i|g|y|a|w| 
|101|102|103|104|105|106|107| 
I GAA | TAT | ATC | GGT | TAC j GCG | TGG | 

| Mlu I | 



409 
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Table 25, continued 

| a | m | v | v | v | 
5 | 108 | 109 | 110 | 111 | 112 | 

| GCC | ATG | GTG j GTG j GTT j 424 

| BstX I |_ 

| Nco I | 

10 |i|v|g|a|t|i|g|i| 
| 113 | 114 | 115 | 116 | 117 | 118 | 119 | 12 0 | 

| ATC | GTT | GGT j GCT | ACC j ATC j GGT | ATC j 448 

|k|l|f|k|k|f|t|s|k|a| 
15 | 121 | 122 | 123 | 12 4 | 12 5 | 12 6 | 12 7 | 12 8 | 12 9 | 13 0 | 

j AAA j CTG j TTT | AAG | AAA | TTT | ACT | TCG j AAA | GCG | 478 

lAsu II | 

| s | . | . | . | 

20 j 131 | 132 | 133 | 134 | 

j TCT j TAA | TAG j TGA j GGT | TAG | CAG | TCT | 502 

| BstE II[ 

| AAG | CCC | GCC | TAA | TGA | GCG | GGC | TTT | TTT | TTT | 532 
2 5 | Trp terminator |_ 

|CCT|GAG|G -3 1 539 
Sau I I 



30 Note the following enzyme equivalences, 

Xma III = Eag I 
Acc III = BspM II 
Dra II = Eco0109 I 
35 Asu II = BstB I 

Sau I = Bsu3 6 I 
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Table 26: DNA_seql (SEQ ID NO: 154) 
Protein Sequence SEQ ID NO: 155 

5 5 ' | ccg | tec | gtC | GGA | CCG | TAT | CCA | GGC | TTT | ACA | CTT | TAT | 
| spacer 1 Rsr II | 1 -35 1 



| GCT | TCC | GGC | TCG | TAT | AAT | GTG | TGG | 
10 I -10 I 

| AAT | TGT | GAG | CGG | ATA | ACA j ATT | 
| lac operator |_ 

15 

| CCT | AGG | 
| Avr Il[ 

20 

| s | k | a | 
j 128 | 129 | 130 | 
| gec | get | ccT j TCG | AAA | GCG j 
2 5 | spacer j Asu II | 



I s | . | . | . | 

| 131 | 132 | 133 | 134 | 
3 0 j TCT j TAA j TAG | TGA j GGT | TAC | CAG | TCT | 

| BstE III 

| AAG | CCC | GCC | TAA | TGA | GCG | GGC | TTT | TTT | TTT | 
3 5 | Trp terminator \ [ 



40 



| CCT | GAG | Gca | ggt | gag | eg - 3 1 
1 Sau I | spacer |_ 
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Table 27: DNA_synthl (SEQ ID NO: 154) 



10 



5 ' | CCG | TCC 1 GTC [ GGA | CCG [ TAT | CCA [ GGC ] TTT | ACA | CTT | TAT [ 



| GCT | TCC | GGC [ TCG [ TAT ] AAT | GTG 1 TGG | 



| AAT 1 TGT j GAG \ CGG | ATA | ACA | ATT [ 

olig#4 '4&'Ii^^©^ = 3 1 - gt taa 



15 



20 



| CCT | AGG | 
gga tec 



/ 3 1 = olig#3 (SEQ ID NO .161) 
GCC 1 GCT | CCT | TCG [ A AA | GCG | 
c 99 cga gga age ttt cgc 



| TCT | TAA | TAG | TGA | GGT | TAC | CAG | TCT | 
25 aga att ate act cca atg gtc aga 



| AAG | CCC | GCC j TAA | TGA | GCG | GGC | TTT | TTT | TTT | 
ttc ggg egg att act cgc ccg aaa aaa aaa 



30 



CCT | GAG | GCA | GGT | GAG | CG 

gga etc cgt cca etc gc - 5 1 (SEQ ipj^JJg Sj 



35 



"Top" strand 99 

"Bottom" strand 100 
40 Overlap 23 (14 c/g and 9 a/t) 

Net length 158 



385 



10 



15 



20 



25 



30 



Table 28: DNA_seq2 (SEQ ID NO: 157) 
Protein sequence: SEQ ID NO: 158 

5 1 - | gca | cca | acg | 
| spacer |_ 



| CCT | AGG| AGG | CTC | ACT | 
| Avr I I | 

1 S- D. | 

| m | k | k | s | 1 | v | 1 | k 
|1|2|3|4|5|6|7|8 
j ATG I AAG I AAA | TCT | CTG | GTT | CTT | AAG 

Afl II 



I v| a| v| a | t | 1 | v|p 

j llj 12 j 13 | 14 | 15| 16| 17| 18 
| GTT | GCT | GTC j GCG j ACC j CTG j GTA j CCG 
1 Nru I 1 1 Kpn I [ 

I s | f | a | r | p | d | f | c 

I 21 | 22 j 23 j 24 | 25 j 26 j 2 7 | 28 
j TCT j TTT j GCT j CGT j CCG j QAT j TTC j TQT 

| AccIII | 



P I P I y I t | g | p | c | k 

i 31 j 32 j 33 j 34 | 35 j 36 j 37 j 38 
i CCG | CCA | TAT | ACT j GGG | CCC j TGC | AAA 

1 PflM I [ 

I Apa I | 



a | s | 
9 | 10 | 
GCT | AGC | 
Nhe I | 



Dra II 



m | 1 | 
19 | 20 | 
ATG | CTG | 



1 | e | 
29 | 30 | 
CTC j GAG | 
Ava I 



Xho I 



a | r | 
39 | 40 | 
GCG | CGC | 
BssH II 



| Pss I | 



35 | i | i | r | 
| 4l| 42 | 43 j 
j ate | ate | cgt | 



40 



I t | s | k | 
| 127 | 128 | 129 | 
j ACT | TCG j AAa | gcg | get | gcg | 
1 Asu II 1 spacer | 
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Table 29: DNA_synth2 (SEQ. ID NO: 157) 



5* - | GCA | CCA | ACG | 



10 



20 



CCT | AGG | AGQ [ CTC | ACT 



ATG AAG AAA TCT CTG GTT 



| GTT | GCT 1 GTC | GCG | ACC | CTG 



15 olig#6 (SEQ ID NO: 160)= 

/ 3 

| TCT 1 TTT | GCT [ CGT | CCG | GAT 
aga aaa cga gca ggc eta 



| CCG | CCA | TAT | ACT | GGG | CCC 
99 c ggt ata tg a ccc 999 a cg ttt cgc gcg 



CTT 1 AAG | GCT | AGC | 



GTA | CCG | ATG | CTG j 



3 1 - ggc tac gac 

= olig#5 (SEQ ID. NO. 162) 
TTC | TGT | CTC | GAG | 
aag aca gag etc 



TGC | AAA | GCG | CGC | 



25 



| ATC | ATC | CGT | 
tag tag gca 



30 



| ACT | TCG | AAA | GCG | GCT | GCG | 
tga age ttt cgc cga cgc - 5 1 



35 



"Top" strand 
"Bottom" strand 
Overlap 
Net length 



99 
99 

24 (14 c/g and 10 a/t) 
155 
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Table 30: DNA_seq3 (SEQ ID NO: 163) 
Protein sequence = SEQ ID NO: 164 



10 



15 



20 



25 



30 



35 



I a I r I 
| 39 | 40 | 

5 ' - | ccc | tgc | aca | GCG | CGC | 
[ spacer |BssH II 



|i|i|r|y|f|y|n 

| 41 | 42 | 43 | 44 | 45 | 46 | 47 

| ATC | ATC j CGT | TAT | TTC j TAC | AAC 

|ajg|l|c|q|t|f 
j 50 j 51 | 52 | 53 j 54 | 55 j 56 
| GCA j GGC | CTG j TGC j CAG | ACC j TTT 
| Stu I | 



|c|r|a|k|r|n|n 
j 6l| 62 | 63 | 64 | 65 j 66 j 67 
| TGC | CGT | GCT | AAG j CGT j AAC j AAC 

I ssp I L 

|s|a|e|d|c|m|r 
j 70 | 71 j 72 | 73 j 74 | 75 j 76 
j TCG | GCC j GAA | GAT j TGC j ATG | CGT 
IXmalll | | Sph I | 

I 9 I a | 

| 80 | 81 | 

j GGC j GCC j get | gaa | 

| Bbe I 1 spacer 



Nar I 



a | k | 
48| 49 j 
GCT j AAA | 

v I y I g I g I 

57| 58 I 59 | 60 | 
GTA | TAC j GGT | GGT | 
Acc I 



Xca I 



f | k | 
68 | 69 | 
TTT j AAA | 



t I c | g | 
77| 78 j 79 | 
ACC j TGC | GGT j 



I t | s | k | 
| 127 | 128 | 129 | 
| ttt | acT | TCG j AAa j gcg | teg | ccg | 
[Asu Il| 



- 3 
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Table 31: DNA_synth3 (SEQ ID NO: 163) 
5 5 1 - 1 CCC 1 TGC 1 ACA | GCG | CGC ] 



| ATC | ATC | CGT | TAT | TTC [ TAC | AAC | GCT [ AAA | 

10 

1 GCA 1 GGC | CTG | TGC | CAG | ACC | TTT 1 GTA | TAC [ GGT | GGT | 
olig#8 (SEQ ID NO: 166)= 3 ! - g cca cca 

/ 3' = olig#7 (SEQ ID NO: 167) 
15 1 TGC 1 CGT | GCT | AAG | CGT ] AAC | A AC \ TTT | AAA | 
acg gca cga ttc gca ttg ttg aaa ttt 

| TCG | GCC | GAA | GAT | TGC | ATG | CGT | ACC | TGC | GGT | 
2 0 age egg ctt eta acg tac gca tgg acg cca 

| GGC | GCC | GCT | GAA | 
ccg egg cgt ctt 

25 

| TTT | ACT | TCG | AAA | GCG | TCG | CCG | 
aaa tga age ttt cgc age ggc -5' 

30 



"Top" strand 93 

"Bottom" strand 97 

35 Overlap 25 (15 g/c & 10 a/t) 

Net length 146 
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10 



15 



20 



25 



30 



cct | cgc | cct 
spacer 



i f 
| 92 

TTT 



I e 
j 101 

GAA 



I a 
| 108 
|GCC 



P I a 
87 j 88 
CCG j GCC 
I Sf i 



n | s 
93 | 94 
AAC I TCT 



Y I i 
102 | 103 

TAT ATC 



m | v 
109 | 110 
ATG | GTG 
BstX I 



Nco I 



Table 32: DNA_seq4 (SEQ ID NO: 159) 
Protein sequence = SEQ ID NO: 165 

9 I a | a | e | g | d | d | 

80 | 81 | 82 j 83 j 84 j 85 j 86 j 
GGC j GCC j GCT j GAA j GGT j GAT j GAT j 
Bbe I | 



Nar I 



k 
89 
AAA 
I 



1 

95 
CTG 



9 
104 
GGT 



v 
111 
GTG 



a | a | 
90 | 91 | 
GCG GCC 



q | a | s | a | t | 
96 j 97 j 98 j 99|100| 
CAA | GCT | TCT j GCT | ACC | 
| Hind 3 [ 

y | a | w | 
105 j 106 j 107 j 
TAC | GCG | TGG | 

| Mlu I | 

v | 
112 | 
GTT 





1 i 


v | g 


a 


t | i | g 


i 






| 113 


114 j 115 


116 


117 | 118 | 119 


120 






j ATC 


GTT | GGT 


GCT 


ACC | ATC j GGT 


ATC 




35 
















1 k 


1 i f 


k 


k | f | t 




k I 




| 121 


122 | 123 


124 


125 | 126 | 127 


128 


129 | 




j AAA 


CTG j TTT 


AAG 


AAA j TTT | ACT 


TCG 


Ma j gcg | teg | ggc 



- 3 



| Asu II | spacer 
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Table 33: DNA_synth4 (SEQ ID ^0:^9^^^ 
5 5 1 | GCT | CGC | CCT | GGC | GCC | GCT | GAA [ GGT [ GAT | GAT | 
| CCG 1 GCC | AAA j GCG | GCC [ 

10 

1 TTT | AAC | TCT | CTG | CAA | GCT | TCT | GCT | ACC | 



1 GAA 1 TAT | ATC [ GGT | TAC [ GCG | TGG | 
15 olig#10 = 3 1 - ata tag cca atg cgc acc 
(SEQ ID NO:168) 

/ 3» = olig#9 (SEQ ID NO: 169) 
| GCC | ATG ] G TG | GTG | GTT | 
2 0 egg tac cac cac caa 



| ATC | GTT | GGT | GCT | ACC | ATC | GGT | ATC | 
tag caa cca cga tgg tag cca tag 

25 

| AAA | CTG | TTT | AAG | AAA | TTT | ACT | TCG | AAA | GCG | TCT | TGA | 
ttt gac aaa ttc ttt aaa tga age ttt cgc aga act - 5 

30 

"Top" strand 10 0 

"Bottom" strand 93 

Overlap 25 (14 c/g and 11 a/t) 

Net length 14 9 

35 



391 



Table 34: Some interaction sets in BPTI 



Number 





4± 

TT 


Dif f . 
AAs 


Pont" ^nfc s 


BPTI 


1 


2 


3 


4 


5 




- R 


9 


D -32 
















- A 


9 


E -32 
















_ T 
— > 




T P F Z -29 


















X VJ 


7.1 R"* 02 T2 H G L K E -18 














i n 

X W 


_ *i 


i o 

X V 


D4 T2 P2 02 E G N K R -18 
















t 
X 


i n 

X 


R91 A9 K9 H9 P L I T G D 


R 










5 




o 


q 


P9 0 R4 A9 H9 N E V F L 

XTZ \J XV a ^i^< 11^ XM i— J V X X_l 


p 










s 5 






X VJ 


D1 R Kfi T*} R9 P9 ^ Y G A I. 


D 








4 


s 




rr 


7 


FT 9 D4 "L3 Y2 T2 A2 S 


F 








s 


5 


x 3 


IT 

D 


n 

X 




c 








x 


x 




D 


X V/ 


T.11 E5 I\T4 K3 02 12 Y2 D2 T R 


L 








4 






"7 
/ 


~j 


T.l ft Ell K2 S O 


E 






s 


4 






Q 
O 


7 


P9fi H2 A2 I L G F 


P 






3 


4 






Q 


9 


P17 A6 V3 R2 Q L K Y F 


P 




s 


3 


4 






i n 


10 


Yll E7 D4 A2 N2 R2 V2 SID 


Y 


s 




s 


4 






1 1 
X X 


10 


T17 P5 A3 R2 I S Q Y V K 


T 


1 


s 


3 


4 








2 


G3 2 K 


G 


x 




X 


x 






X .j 


5 


P22 R6 L3 N I 

XT IV w X_l *—/ li 


p 


1 




s 


4 


s 






-* 
_> 


C3 1 T A 


c 


1 




s 


s 


5 


9 R 


1 R 


12 


K15 R4 Y2 M2 L2 -2 VGA I N 


F K 


1 


s 


3 


4 


s 




_L O 


7 


A99 GS D2 R K D F 


A 


1 


s 


s 


s 


5 




1 7 
X / 


1 9 


R12 K5 A2 Y3 H2 S2 F2 L M T G 


P R 


1 


2 


3 




s 




1 ft 




121 M4 F3 L2 V2 T 


I 


1 


s 


s 




5 




X .7 


7 


111 P10 R6 S2 K2 L O 


I 


1 


2 


3 




s 


7 n 




_j 


R1 9 A7 S4 L2 O 


R 


s 


s 


s 




5 




9 1 


4 


Ylft F13 W I 

_L ^ u J- -J- rv x 


Y 




2 


s 


s 


s 




9 9 


D 


F1 4 Y1 4 H9 A N S 

17 X ^ X X "X XX XM k— ' 


F 




s 


3 


4 






9 7 




Y3 2 F 


Y 






s 


s 






9 4. 


4 


N2 6 K3 D3 S 


N 




s 


3 






J — ' 


25 


10 


A12 S5 Q3 P3 W3 L2 T2 K G R 


A 






s 


s 






9 


9 


K16 A6 T2 E2 S2 R2 G H V 


K 




s 


3 


4 






97 
z. / 


5 


A18 S8 K3 L2 T2 


A 




2 


3 


4 






9ft 


7 


G13 K10 N5 02 R H M 


G 




2 


s 


s 






29 


10 


L9 Q7 K7 A2 F2 R2 M G T N 


L 




2 


3 






40 


30 


1 


C33 


C 




X 


X 


X 






31 


7 


Q12 Ell L4 K2 V2 Y N 


Q 




2 


3 


4 






32 


11 


T12 P5 K4 Q3 E2 L2 G V S R A 


T 




2 


3 


s 






33 


1 


F33 


F 


X 


X 


X 


X 






'34 


11 


Vll 18 T3 D2 N2 Q2 F H P R K 


V 


1 


2 


3 


s 




45 


35 


2 


Y31 W2 


Y 


s 


s 


s 




5 




36 


3 


G27 S5 R 


G 


1 












37 


1 


G3 3 


G 


X 








X 




38 


3 


C31 T A 


C 


1 






s 


5 




39 


7 


R13 G9 K4 Q3 D2 P M 


R 


1 






4 


s 
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Table 34: continued. 



5 Number 
Res. Diff. 

# AAs Contents BPTI 1 2 3 4 5 





40 


2 


G22 


All 


A 


s 


s 


5 


1 0 


4 1 


3 


N2 0 


Kll D2 


K 




4 


s 




4.9 


9 


All 


R9 S4 G3 H2 D O K N 


R 




s 


5 




43 


2 


N31 


G2 


N 






s 




44 


3 


N21 


Rll K 


N 






s 




45 


2 


F32 


Y 


F 






s 


15 


46 


8 


K24 


E2 S2 D H V Y R 


K 






5 




47 


2 


T19 


S14 


s 


s 




5 




48 


9 


All 


19 E4 T2 W2 L2 R K D 


A 


2 


s 


s 






7 


Pi 9 


Dfi A2 02 K2 T H 


E 


2 




s 




50 


6 


E16 


D12 L2 M Q K 


D 


s 




5 


20 


51 


1 


C33 




C 


X 




X 




52 


7 


R13 


M10 L3 E3 Q2 H V 


M 


2 




s 




53 


8 


R21 


Q3 E2 H2 C2 G K D 


R 


s 




5 




54 


7 


T2 3 


A3 V2 E2 I Y K 


T 






5 




55 


1 


C33 




C 






X 


25 


56 


8 


G15 


V8 13 E2 R2 A L S 


G 










57 


8 


G19 


V4 A3 P2 -2 R L N 


G 










58 


8 


All 


-10 P3 K3 S2 Y2 R F 


A 










59 


9 


-24 


G2QEAYSPR 












60 


6 


-28 


Q R I G D 










30 


61 


3 


-31 


T P 












62 


2 


-32 


D 












63 


2 


-32 


K 












64 


2 


-32 


S 











3 5 s indicates secondary set 

x indicates in or close to surface but buried and/or 

highly conserved. 
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Table 35: 

5 Distances from C £ to 

Tip of Side Group in A 

Amino Acid type Distance 





A 


0 . 0 




C* ( y e=± (~\-\ -\ r*> £i r\ \ 


1 . 8 




Y\ 
U 


2 . 4 




J_j 


3 . 5 




T? 

r 


*± . jj 




G 




15 


H 


4 . 0 




I 


2 .5 




K 


5 . 1 




L 


2 .6 




M 


3 .8 


20 


N 


2 .4 




P 


2 .4 




Q 


3 .5 




R 


6.0 




S 


1 . 5 


25 


T 


1.5 




V 


1.5 




W 


5.3 




Y 


5.7 



30 

Notes: These distances were calculated for standard model 
parts with all side groups fully extended. 
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Table 36: Distances, BPTI residue set #2 
Distances in A between C £ 

Hypothetical Cg was added to each Glycine. 



5 




R17 


119 




Y21 




A2 7 




G2 8 




L2 9 


Q31 




T32 




V34 




A4 8 






119 


7 . 


7 








































Y21 


15 . 


1 


8 . 


4 




































A2 7 


22 . 


6 


17 . 


1 


12 . 


2 
































G2 8 


26. 


6 


20 . 


4 


13 . 


8 


5. 


3 


























10 


L2 9 


22 . 


5 


15 . 


8 


9. 


6 


5 . 


1 


5 . 


2 
























Q31 


16. 


1 


10. 


4 


6. 


8 


6. 


8 


10 . 


6 


6. 


8 




















T32 


11 . 


7 


5 . 


2 


6 . 


1 


12 . 


0 


15 . 


5 


10 . 


9 


5 . 


4 
















V34 


5. 


6 


6 . 


5 


11. 


6 


17 . 


6 


21 . 


7 


18. 


0 


11. 


4 


8 . 


2 












A4 8 


18. 


5 


11 . 


0 


5. 


4 


12 . 


6 


13 . 


3 


8. 


4 


8 . 


8 


8 . 


3 


15. 


7 






15 


E4 9 


22 . 


0 


14 . 


7 


8 . 


9 


16 . 


9 


16 . 


1 


12 . 


2 


13 . 


9 


13 . 


3 


19 . 


8 


5 . 


5 




M52 


23 . 


6 


16 . 


3 


8 . 


6 


12 . 


2 


10 . 


3 


7 . 


6 


11 . 


3 


13 . 


2 


20 . 


0 


6 . 


2 




P9 


14 . 


0 


11 . 


3 


9. 


0 


12 . 


2 


15 . 


4 


13 . 


3 


7 . 


9 


9 . 


2 


8 . 


7 


13 . 


9 




Til 


9. 


5 


11 . 


2 


13 . 


5 


18 . 


8 


22 . 


5 


19. 


8 


13 . 


5 


12 . 


1 


5 . 


7 


18 . 


5 




K15 


7 . 


9 


14 . 


6 


20 . 


1 


27 . 


4 


31 . 


3 


27 . 


9 


21 . 


4 


18 . 


1 


10 . 


3 


24 . 


6 


20 


A16 


5 . 


5 


10 . 


1 


15. 


9 


25. 


2 


28 . 


5 


24 . 


6 


18 . 


6 


14 . 


5 


8 . 


6 


19. 


8 




118 


6 . 


1 


6 . 


0 


11 . 


2 


21 . 


3 


24 . 


4 


20 . 


2 


14 . 


7 


10 . 


4 


7 . 


0 


15 . 


0 




R2 0 


10 . 


6 


5 . 


9 


5. 


4 


16. 


0 


18 . 


5 


14 . 


6 


9. 


8 


6 . 


9 


7 . 


8 


10 . 


2 




F22 


15 . 


6 


10 . 


9 


5. 


6 


10 . 


5 


12 . 


8 


10 . 


3 


6 . 


2 


8 . 


1 


10 . 


8 


10 . 


3 




N24 


19. 


9 


14 . 


7 


9. 


4 


4 . 


1 


7 . 


3 


6. 


1 


4 . 


8 


10 . 


0 


14 . 


7 


11 . 


4 


25 


K2 6 


24 . 


4 


20 . 


1 


15 . 


2 


5 . 


4 


7 . 


7 


9 . 


8 


10 . 


1 


15 . 


3 


19 . 


0 


17 . 


0 




C3 0 


18 . 


9 


12 . 


1 


4 . 


6 


8. 


8 


9 . 


5 


5 . 


3 


5 . 


9 


8 . 


2 


14 . 


9 


4 . 


9 




F3 3 


10 . 


8 


7 . 


4 


7 . 


7 


12 . 


6 


16 . 


4 


13 . 


0 


6 . 


6 


5 . 


6 


5 . 


5 


12 . 


2 




Y3 5 


8 . 


4 


7 . 


4 


9. 


4 


18 . 


4 


21 . 


4 


17 . 


9 


12 . 


2 


9 . 


5 


5 . 


8 


14 . 


4 




S4 7 


17 . 


6 


10 . 


6 


6 . 


6 


17 . 


3 


17 . 


9 


13 . 


4 


12 . 


6 


10 . 


4 


15 . 


9 


5 . 


3 


30 


D50 


20 . 


0 


13 . 


6 


7 . 


2 


17. 


2 


16 . 


8 


13 . 


5 


13 . 


5 


12 . 


9 


17 . 


6 


7 . 


6 




C51 


18 . 


9 


12 . 


2 


4 . 


0 


12 . 


1 


12 . 


2 


8 . 


8 


8 . 


8 


9 . 


7 


15 . 


3 


5 . 


4 




R53 


25. 


4 


18 . 


6 


11. 


0 


17 . 


2 


15 . 


0 


13 . 


0 


15 . 


7 


16 . 


7 


22 . 


3 


9 . 


7 




R3 9 


15 . 


4 


16 . 


9 


17 . 


1 


24 . 


9 


27 . 


2 


24 . 


9 


20 . 


1 


18 . 


7 


13 . 


8 


22 . 


3 
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Table 36, continued. 

Distances in A between C s . 

Hypothetical C 6 was added to each Glycine. 

5 E49 M52 P9 Til K15 A16 118 R20 F22 N24 

M52 6.1 

P9 17.7 15.5 

Til 22.1 21.5 7.2 

K15 27.5 28.7 16.4 9.5 

10 A16 22.2 24.2 14.9 9.8 6.2 

118 17.4 19.5 12.2 9.5 10.4 4.9 

R20 13.0 13.8 8.0 9.4 14.9 10.6 6.2 

F22 13.8 11.4 4.1 10.6 19.1 16.3 12.7 6.9 

N24 15.6 11.2 8.4 15.3 24.1 21.9 18.2 12.7 6.6 

15 K26 20.9 15.7 12.1 18.6 27.9 26.6 23.3 18.1 11.6 5.9 

C30 8.7 5.6 10.6 16.6 24.1 20.2 15.7 9.8 6.8 6.9 

F33 16.5 15.4 4.2 7.1 15.0 12.8 9.6 6.1 5.6 9.3 

Y35 17.2 17.8 7.8 5.8 11.0 7.6 4.9 4.3 8.8 14.8 

S47 4.7 9.1 15.3 18.5 23.1 17.6 12.8 9.1 12.0 15.3 

20 D50 5.5 7.7 14.7 18.6 24.2 19.2 14.7 9.9 11.0 14.7 

C51 7.1 5.4 11.0 16.4 23.5 19.2 14.6 8.7 6.9 9.6 

R53 6.3 5.6 17.9 23.1 29.6 24.8 20.3 15.0 13.8 15.5 

R39 23.9 24.0 13.0 9.5 12.0 11.8 12.5 12.8 14.7 20.8 

25 K26 C30 F33 Y35 S47 D50 C51 R53 

C30 12.4 

F33 13.9 10.1 

Y35 19.5 13.5 6.4 

S47 21.0 8.8 13.5 13.2 

30 D50 20.1 8.6 14.3 13.7 5.0 

C51 15.0 3.7 10.9 12.5 6.9 5.2 

R53 19.9 9.9 18.2 18.8 9.4 5.8 7.4 

R39 24.3 20.6 14.4 9.6 20.4 19.0 18.8 23.4 
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Table 37: vgDNA to vary BPTI set #2.1 (SEQ ID NO: 170) 
Protein sequence = SEQ ID NO: 171 

|g|p|c|k|a|x| 
5 j 35 | 36 | 37 j 38 | 39| 40 j 

5 1 - | CAC 1 CCT 1 GGG | CCC [ TGC [ AAA | GCG | qf k | 208 
| spacer | Apa I | 

10 |i|x|r|y|f|y|n|a|k| 
j 41 j 42 | 43 j 44 j 45 j 46 j 47 j 48 j 49 | 

| ATC | qf k 1 CGT | TAT | TTC | TAC | AAC | GCT | AAA | 2 35 

/ 3' = olig#27 72 nts 
15 + I + | + (SEQ ID NO: 172) 

|x|g|X|c|q|t|f|x|y|g|g| 

j 50 j 5lj 52 j 53 j 54 | 55 j 56 j 57 | 58 | 59 | 60 | 

1 qf k | GGt | qf k | TGC | GAG | ACC | TTc j qf k j TAC j GGT j GGT j 2 68 

olig#28= 3'- acg gtc tgg aag **m atg cca cca 
20 78 nts (SEQ ID NO:173) 

Overlap =12 (7 CG, 5 AT) 

I c |r | a |k| r |n|n | f | k| 

25 j 61 | 62 j 63 | 64 j 65 j 66 | 67 | 68 | 69 | 

| TGC j CGT | GCT j AAG j CGT | AAC | AAC j TTT | AAA | 2 95 

acg gca cga ttc gca ttg ttg aaa ttt 
1 Esp I L 

30 + 

|s|x|e|d|c|m| 
j 70 | 71| 72 j 73 | 74 j 75 j 

| TCT j qf k | GAG j GAT j TGC j ATG j C 322 
age **m etc eta acg tac gca ccc acc -5 1 

3 5 | Sph I | spacer | 

k = equal parts of T and G; m = equal parts of C and A; 
q = (.26 T, .18 C, .26 A, and .30 G) ; 
f = (.22 T, .16 C, .40 A, and .22 G) ; 

4 0 * = complement of symbol above 

Residue 40 42 50 52 57 71 

Possibilities 21 x 21 x 21 x 21 x 21 x 21 = 8.6 x 10 7 
Abundance x 10: 
45 of PPBD .768 .271 .459 .671 .600 .459 

Produce = 1.77 x 10" 8 

Parent - 1/(5.5 x 10 7 ) least favored = 1/(4.2 x 10 9 ) 

Least favored one -amino -acid substitution from PPBD present at 1 in 1 . 6 x 
50 10 7 
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Table 38: Result of varying set#2 of BPTI 2.1 
DNA Sequence = (SEQ ID NO: 174) 
Protein Sequence = SEQ ID NO: 175 
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Nar I 



g | p | c | k 
35 ( 36 j 37 j 38 
GGG CCC TGC AAA 



1 | e 
29 1 30 
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Ava I 



Xho I 



a | D 
39 j 40 
GCG j GAT 
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43 j 44 


45 


46 | 47 


48 


49 




ATC 


CAG 
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c 
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g I 
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235 



268 



295 



325 
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Table 39: vgDNA to vary set#2 BPTI 2.2 (SEQ ID NO: 176) 
Protein sequence = SEQ ID NO: 177 



|g|p|c|x|a|D| 
| 35 | 36 j 37 j 38 | 39 j 40 | 
5 ' - eg gca cgc [ GGG | CCC | TGC [ mrA | GCG [ GAT | 
1 spacer [ Apa I | 
+ + + 

|X|Q|X|x|f|y|n|a|k| 
j 41 | 42 | 43 | 44| 45 | 46 | 47 | 48 | 49 | 
| rwA 1 CAG 1 rvk 1 TwT | TTC | TAC | AAC | GCT | AAA | 



208 



235 



+ + + 

|E|x|L|c|x|x|f|S|y|g|g| 
15 j 50 j 51 j 52 j 53 j 54 ( 55 j 56 | 57 j 58 | 59 | 60 | 
| GAG | qf k | CTG | TGC | qf k [ qf k | TTT | TCG | TAC 1 GGT | GGT | 
61 nts olig#30 (SEQ ID NO: 178) 3*- g cca cca 



268 



20 



25 



30 



35 



40 



45 



Overlap =15 (11 CG, 4 AT) 



/- 3' olig#29 94 nts 
|c|r|a|k|r|n|n|f | 
| 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 
| TGC | CGT | GC T j AAG j CGT j AAC | AAC j TTT | AAA j 
acg gca cga ttc gca ttg ttg aaa ttt 
I Esp I | 
+ 

|s|w|x|d|c|m| 
j 70 | 71 | 72 j 73 | 74 | 75 j 
| TCG | TGG j qf k j GAT j TGC | ATG j C 
age acc **m eta acg tac gcg acc tgc 

| Sph I | spacer | 



(SEQ ID NO: 179) 



295 



-5 • 



k 
m 
w 

q 

f 
* 



equal parts of T and G; v = equal parts of C, A, and G; 

equal parts of C and A; r = equal parts of A and G; 
equal parts of A and T; 

(.26 T, .18 C, .26 A, and .30 G) ; 

(.22 T, .16 C, .40 A, and .22 G) ; 



= complement of symbol above 



Residue 
Possibilities 



38 41 43 44 51 
4 x 4 x 9 x 2x21 



Abundance x 10 2.5 2.5 .833 
Product = 2.3 x 10" 8 



663 .397 



54 55 72 
x 21 x 21 x 21 

= 6.2 x 10' 
437 .602 



Parent = 1/(4.4 x 10 7 ) least favored = 1/(1.25 x 10 9 ) 

Least favored one-amino-acid substitution from PPBD present at 1 

1.2 x 10 7 



xn 
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Table 40: Result of varying set#2 of BPTI 2.2 
DNA sequence = SEQ ID NO: 180 
Protein sequence = SEQ ID NO: 181 

I 1 I e | 
| 29 | 30 | 

j CTC | GAG j 178 
| Xho I 1 

10 

|p|p|y|t|g|p|c|E|a|D| 
j 31 j 32 j 33 | 34 j 35 | 36 | 37 | 38 | 39| 40 | 

| CCG j CCA | TAT j ACT j GGG j CCC j TGC j GAG j GCG j GAT | 2 08 

15 1 PflM I |_ 

| Apa I | 

|v|Q|N|P|f|y|n|a|k| 
j 4l| 42 j 43 | 44 j 45 j 46 j 47 j 48 j 49 j 
20 | GTT | CAG j AAT j TTT j TTC j TAC | AAC j GCT j AAA j 235 



|E|F|L|c|s|A|f|S|y|g|g| 
j 50 j 51 | 52 | 53 | 54 | 55 j 56 j 57 j 58 | 59 | 60 | 
25 | GAG | TTT j CTG | TGC j TCT | GCT j TTT j TCG j TAC j GGT j GGT | 268 



I c | r | a | k| r | n | n | f | k| 

| 61 | 62 j 63 j 64 | 65 | 66 | 67 | 68 | 69 1 
3 0 | TGC | CGT | GCT j AAG j CGT j AAC j AAC | TTT j AAA j 2 95 

1 ESP I L 



I s | W | Q | d | c | m | r | t | c | g | 

35 j 70 j 7l| 72 j 73 j 74 j 75 j 76 j 77 j 78 j 79 j 

j TCG j TGG | CAG | GAT j TGC | ATG | CGT j ACC j TGC | GGT j 32 5 

I Sph I| 



40 | g | a | 
j 80 j 81 I 
j GGC | GCC j 
| Bbe I | 
[ Nar I 
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Table 41: vg DNA set#2 of BPTI 2.3 (SEQ ID NO:182) 
Protein sequence = SEQ ID NO: 183 

I 1 I e | 

5 | 29 \ 30| 

5'- eg age ctg[CTC|GAG| 178 
| spacer [ Xho I | 

+ + + 

10 |p|x|y|X|g|p|c|E|a|x| 
j 31 j 32 | 33 | 34| 35 j 36 j 37 j 38 | 39 | 40 j 

| CCG 1 vmg | TAT | wig ] GGG j CCC 1 TGC | GAG [ GCG | qf k | 2 08 

+ 

15 | V | Q | N | X | f | y | n | a | k | 
j 4l| 42 j 43 j 44 j 45 j 46 j 47| 48 j 49 j 

1 GTT | CAG | AAT | Tdk | TTC | TAC | AAC 1 GCc | AAg [ -3' olig#33 71 nts 
67 nts olig#34 3'- g atg ttg egg ttc (SEQ ID NO: 184) 

(SEQ ID NO: 185) 

20 

Overlap = 13 (7 CG, 6 AT) 

+ + + + 

|x|F|x|c|S|x|f|x|y|g|g| 
25 j 50 | 5l| 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 

| Vag | TTT | nTk | TGC | TCT j qf k | TTT j qf k | TAC j GGT j GGT j 268 
btc aaa nam acg aga **m aaa **m atg cca cca 

| c | r | a | k | 
30 | 61 | 62 | 63 | 64 | 

| TGC j CGT j GCT j AAG | C 
acg gca cga ttc gcg acc ggc 5' 
1 Esp I | spacer [ 

3 5 k = equal parts of T and G; m = equal parts of C and A; 

w = equal parts of A and T; n = equal parts of A,C,G,T; 
d = equal parts A,G,T; v = equal parts A,C,G; 

q = (.26 T, .18 C, .26 A, and .30 G) ; 
f = (.22 T, .16 C, .40 A, and .22 G) ; 

4 0 * = complement of symbol above 

Residue 32 34 40 44 50 52 55 57 

Possibilities 6 x 6 x 21 x 6 x 3 x 5x21x21= 

3 x 10 7 

4 5 Abundance x 10 

of PPBD 10/6 10/6 .545 10/6 10/3 30/8 .459 .701 

product = 1.01 x 10~ 7 

parent = 1/(1 x 10 7 ) least favored = 1/ (4 x 10 8 ) 

50 Least favored one-amino-acid substitution from PPBD present at 1 in 
3 x 10 7 
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Table 42: Result of varying set#2 of BPTI 2.3 
DNA sequence = SEQ ID NO: 186 
Protein sequence = SEQ ID NO: 187 
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Table 101a: VI I I signal: :bpti: :VIII-coat gene (SEQ ID NO: 188) 
pbd modi 4 : 9 V 89 : Sequence cloned into pGEM-MBl 
pGEM-3Zf (-) [ Hin di] : :lacUV5 Sac l/ gene / 
TrpA attenuator/ (Sai l) : :pGEM-3Zf ( - ) [ Hin di] ! 



25 



5 ! -(GAATTC GAGCTCGGTAC C CGG GGATCC TCTAGAGTC) - Ipolylinker 
GGC tttaca CTTTATGCTTCCGGCTCG tataat GTG ! lacUV5 



TGG aATTGTGAGCGcTcACAATT 



lacO-symm operator 
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ctt 


aag 


get 


age 




gtt 


get 


gtc 


gcg 


ace 


ctg 


gta 


cct 


atg 


ttg 




tec 


ttc 


get 


cgt 


ccg 


gat 


ttc 


tgt 


etc 


gag 




cca 


cca 


tac 


act 


ggg 


ccc 


tgc 


aaa 


gcg 


cgc 




ate 


ate 


cgC 


tat 


ttc 


tac 


aat 


get 


aaa 


gca 


15 


ggc 


ctg 


tgc 


cag 


ace 


ttt 


gta 


tac 


ggt 


ggt 




tgc 


cgt 


get 


aag 


cgt 


aac 


aac 


ttt 


aaa 


teg 




gec 


gaa 


gat 


tgc 


atg 


cgt 


ace 


tgc 


ggt 


ggc 




gec 


get 


gaa 


ggt 


gat 


gat 


ccg 


gec 


aaG 


gcg 




gec 


ttc 


aat 


tct 


ctG 


caa 


get 


tct 


get 


ace 


20 


gag 


tat 


att 


ggt 


tac 


gcg 


tgg 


gee 


atg 


gtg 




gtg 


gtt 


ate 


gtt 


ggt 


get 


ace 


ate 


ggg 


ate 




aaa 


ctg 


ttc 


aag 


aag 


ttt 


act 


teg 


aag 


gcg 




tct 


taa 


tga 


tag 


GGTTACC 


i 


BstEII 





AGTCTA AGCCCGC CTAATGA GCGGGCT TTTTTTTT 
aTCGA - ! ( Sai l ghost) 
(GACCTGCAGGCATGCAAGCTT. . . -3 • ) 



M13 leader 
<- codon # 



10 

20 

30 

40 

50 

60 

70 

80 

90 

100 

110 

120 

130 



terminator 



pGEM polyl inker 



30 



Notes : 

a Designed sequence contained AGGAGG, but sequencing indicates 
that actual DNA contains AGAGG . 



406 



10 



15 



20 



25 



Table 101b: VIII -signal : :bpti : : VHI-coat gene (SEQ ID NO: 189) 
Bam HI- Sal l cassette, after insertion of Sai l linker 
in Pst I site of pGEM-MBl . 
pGEM-3Zf (-) [ Hin di] : : lacUVS Sac l/ gene / 
TrpA attenuator/ (Sai l) : :pGEM-3Zf (-) [ Hin di] ! 
5 1 -GAATTC GAGCTC GGTACCCGG GGATCC TCTAGA GTC- i BamHI 



GGC tttaca CTTTATGCTTCCGGCTCG tataat GTG 



lacUVS 



TGG aATTGTGAGCGcTcACAATT 
gagctc AGAGG CttaCT ! 



! lacO-symm operator 
Sac I; Shine -Dalgarno seq. 



atg 


aag 


aaa 


tct 


ctg 


gtt 


ctt 


aag 


get 


age 


gtt 


get 


gtc 


gcg 


acc 


ctg 


gta 


cct 


atg 


ttg 


tec 


ttc 


get 


cgt 


ccg 


gat 


ttc 


tgt 


etc 


gag 


cca 


cca 


tac 


act 


ggg 


ccc 


tgc 


aaa 


gcg 


cgc 


ate 


ate 


cgC 


tat 


ttc 


tac 


aat 


get 


aaa 


gca 


ggc 


ctg 


tgc 


cag 


acc 


ttt 


gta 


tac 


ggt 


ggt 


tgc 


cgt 


get 


aag 


cgt 


aac 


aac 


ttt 


aaa 


teg 


gec 


gaa 


gat 


tgc 


atg 


cgt 


acc 


tgc 


ggt 


ggc 


gec 


get 


gaa 


ggt 


gat 


gat 


ccg 


gec 


aaG 


gcg 


gec 


ttc 


aat 


tct 


ctG 


caa 


get 


tct 


get 


acc 


gag 


tat 


att 


ggt 


tac 


gcg 


tgg 


gec 


atg 


gtg 


gtg 


gtt 


ate 


gtt 


ggt 


get 


acc 


ate 


ggg 


ate 


aaa 


ctg 


ttc 


aag 


aag 


ttt 


act 


teg 


aag 


gcg 


tct 


taa 


tga 


tag 


GGTTACC 




BstEII 





10, 

20 

30 

40 

50 

60 

70 

80 

90 

100 

110 

120 

130 



M13 leader 
< - codon # 



AGTCTA AGCCCGC CTAATGA GCGGGCT TTTTTTTT 
aTCGA GACctgca GGTCGACC ggcatgc-3 1 

I Sail I 



I terminator 
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Table 102a: Annotated Sequence of gene 
found in pGEM-MBl (SEQ ID NO: 190) 
Protein sequence = SEQ ID NO: 191 



nucleotide 
number 



10 



5 1 - (G GATCC TCTAGA GTC) GGC- 
from pGEM polyl inker 

tttaca CTTTATGCTTCCGGCTCG tataat GTGTGG - 
-35 lacUV5 -10 



39 



15 



aATTGTGAGCGclcACAATT - 
lacO-symm operator 



59 



20 



25 



30 



35 



40 



gagctc AG (G) AGG 

SacI Shine-Dalgarno seq. 



|fM | K | K | S | L | V | L | K 
|1|2|3|4|5|6|7|8 
| ATG | AAG | AAA j TCT | CTG j GTT j CTT | AAG 

| Afl II 



|v|a|v|a|t|l|v|p 

| 11 | 12 | 13 | 14 j 15 | 16 | 17 j 18 
| GTT j GCT j GTC | GCG j ACC j CTG | GTA j CCT 
1 Nru I | [ Kpn I | 

|s|f|a|r|p|d|f|c 

| 2l| 22 j 23 j 24 j 25 | 2S\ 27 j 28 
j TCC j TTC j GCT j CGT j CCG | GAT | TTC j TGT 

j 1 AccIII | 
M13/BPTI Jnct 



|p|p|y|t|g|p|c|k 

| 31 | 32 | 33 j 34 j 35 | 3 6 j 37 j 38 
| CCA | CCA | TAC | ACT j GGG | CCC | TGC j AAA 

| PflM I I | | 

I Apa I M 
1 Dra II [ 
1 Pss I 1 



CttaCT- 



A | S 
9 | 10 
GCT | AGC 
Nhe I 



77 



M | L 
19| 20 
ATG | TTG 



L | E 
29 1 30 
CTC j GAG 
Ava I 



Xho I 



A | R 
39 j 40 
GCG | CGC 
BssH II 



107 



137 



167 



197 



408 



Table 102a : Annotated Sequence 
of gene found in pGEM-MBl 
(continued) 



10 



15 



20 



25 



30 



35 



40 



45 



|i|i|r|y|f|y|n|a|k|a| 

| 41 j 42 j 43 I 44 I 45 | 46 j 47 | 48 j 4 9 | 50 | 
| ATC j ATC j CGC | TAT j TTC | TAC | AAT | GCT j AAA j GC | 



|G|L|C|Q|T|F 
j 51 j 52 j 53 | 54 j 55 j 56 
A | GGC | CTG | TGC | CAG | ACC | TTT 
| Stu I 1 



V I Y 
57 j 58 
GTA j TAC 
Acc I 



Xca I 



G | G I 
59 | 60 | 
GGT | GGT j 



226 



257 



|c|r|a|k|r|n|n|f|k| 

| 61 j 62 j 63 | 64 | 65 | 66 | 67 | 68 j 69 j 
| TGC j CGT j GCT | AAG j CGT | AAC j AAC | TTT | AAA j - 
1 Esp I 1 

|s|a|e|d|c|m|r|t|c|g| 

j 70 j 71 j 72 I 73 j 74 | 75 j 76 j 77 | 78 j 79 | 
j TCG j GCC j GAA | GAT | TGC j ATG j CGT j ACC j TGC j GGT j - 
|XmaIII 1 | Sph I | 



284 



314 



BPTI/M13 boundary 
I 

g|a|a|e|g|d|d|p|a|k|a|a| 

80 | 81 j 82 | 83 j 84 | 85 j 86 j 87 j 88 j 89 j 90 | 91 j 
GGC | GCC j GCT j GAA j GGT j GAT j GAT j CCG | GCC j AAG j GCG j GCC j 
Bbe I I Sfi I 



350 



Nar I 



f|n|s|l|q|a|s|a|t| 

92 j 93 j 94 | 95 j 96 | 97 j 98 j 99 j 100 | 
TTC | AAT j TCT | CTG j CAA j GCT j TCT | GCT j ACC | 

| Hind 3 | 

e|y|i|g|y|a|w| 

101|102|103|104|105|10 6|107| 
GAG j TAT j ATT | GGT j TAC j GCG j TGG j - 

a|m|v|v|v|i|v|g|a| 

108|109|110|111|112| 113 | 114 | 115 | 116 j 
GCC | ATG | GTG j GTG j GTT j ATC j GTT j GGT | GCT j 

1 BstX I | 

| Nco I | 



377 



398 



425 



409 



Table 102a : Annotated Sequence 
of gene found in pGEM-MBl 
(continued) 

| T | I | G | I | 
| 117 | 118 | 119 | 120 | 

| ACC | ATC | GGG j ATC | - 437 

10 

|k|l|f|k|k|f|t|s|k|a| 

j 121 j 122 | 12 3 | 124 | 12 5 | 12 6 | 12 7 j 12 8 j 12 9 | 13 0 | 
| AAA j CTG | TTC | AAG | AAG j TTT j ACT j TCG j AAG | GCG | - 467 

1 Asu II | 

15 

I S | . | . | . | 

| 131 | 132 | 133 | 134 | 

|TCT|TAA|TGA|TAG| GGTTACC- 486 

Bst E II 

20 

AGTCTA AGCCCGC CTAATGA GCGGGCT TTTTTTTT- 521 
terminator 



25 

aTCGA (GACctgcaggcatgc) -3 1 

( Sai l ) from pGEM polyl inker 



3 0 Notes: 

a Designed called for Shine-Dalgarno sequence, AGGAGG, 
but sequencing shows that actual constructed gene contains 
AGAGG . 

35 

Note the following enzyme equivalences, 

Xma III = Eag I Acc III = BspM II 

Dra II = ECOQ109 I Asu II = BstB I 

40 



410 



10 



Table 102b : Annotated Sequence of gene 
after insertion of Sai l linker (SEQ ID NO: 192) 
Protein sequence = (SEQ ID NO: 191) 



5 1 - (GGATCC TCTAGA GTC) GGC- 
from pGEM poly linker 



nucleotide 
number 



tttaca CTTTATGCTTCCGGCTCG tataat GTGTGG- 
-35 lacUV5 -10 



39 



15 



aATTGTGAGCGcTcACAATT - 
lacO-symm operator 



59 



20 gagctc AGAGG CttaCT- 

SacI Shine-Dalgarno seq. 



77 



25 



|£M | K | K | S | L | V | L | K | A | S 
|1|2|3|4|5|6|7|8|9|10 
| ATG j AAG | AAA | TCT j CTG | GTT j CTT | AAG | GCT j AGC | 

Af 1 II Nhe I 



107 



30 



35 



|v|a|v|a|t|l|v|p|m|l 

I 11| 12 I 13 | 14 | 15| 16 j 17 j 18| 19 | 20 
| GTT | GCT j GTC | GCG j ACC j CTG j GTA j CCT j ATG j TTG | 
1 Nru I 1 1 Kpn I [ 

| S | F | A | R | P | D | F | C | L | E 
j 21 | 22 | 23 | 24 | 2S \ 26 j 27 | 28 | 29 | 30 
| TCC | TTC j GCT j CGT | CCG j GAT j TTC j TGT j CTC j GAG | 

f | AccIII | | Ava I 

M13/BPTI Jnct I Xho I 



137 



167 



40 



45 



|p|p|y|t|g|p|c|k 

j 31 | 32 | 33 | 34 | 35 j 36 j 37 j 38 
| CCA | CCA | TAC j ACT | GGG j CCC | TGC | AAA 
1 PflM I [ I I 

I Apa I M 
Dra II | 

I Pss I I 



A | R 
39 j 40 
GCG | CGC 
BssH II 



197 



411 



Table 102b : Annotated Sequence 
of gene after insertion of Sai l linker 
(continued) 



|i|i|r|y|f|y|n|a|k|a| 

I 4l| 42 j 43 j 44 j 45 j 46 j 47 j 48 j 49 | 50 j 
I ATC ATC CGC TAT TTC TAC AAT GCT | AAA GC 



226 



|G|L|C|Q|T|F|V| Y 
10 | 51| 52 | 53 | 54 | 55 j 56 j 57 | 58 
A | GGC | CTG | TGC | CAG | ACC | TTT | GTA | TAC 
I Stu I I Acc I 



Xca I 



G | G I 
59 | 60 | 
GGT GGT 



257 



15 



20 



25 



30 



35 



40 



45 



| C | R | A | K [ R | N | N | F | K | 
j 61 | 62 | 63 | 64 | 65 | 66 | 67 j 68 j 69 j 
j TGC | CGT | GCT | AAG | CGT | AAC | AAC j TTT j AAA j - 
1 Esp I I 

|S|A|E|D|C|M|R|T|C|G| 
| 70j 71| 72 | 73 | 74 | 75 | 76 j 77 | 78 j 79 j 
| TCG | GCC | GAA | GAT | TGC | ATG j CGT j ACC j TGC j GGT j 
IXmalll 1 | Sph I [ 

BPTI/M13 boundary 
4 



G | A 
80 j 81 
GGC | GCC 
Bbe I 



Nar I 



F j N 
92 j 93 
TTC | AAT 



E I Y 
101 j 102 
GAG | TAT 

A | M 
108 | 109 
GCC ATG 



284 



314 



A | E | G | D | D | P | A | K | A | A | 

82 j 83 | 84 | 85 j 86 j 87 | 88 | 89| 90 j 91 j 

GCT j GAA | GGT j GAT j GAT j CCG j GCC j AAG j GCG j GCC | - 

I Sfi I I 



350 



S|L|Q|A|S|A|T| 
94 | 95 j 96 j 97 | 98 | 99|l00| 
TCT | CTG | CAA j GCT j TCT | GCT | ACC | 
| Hind 3 | 

I | G | Y | A | W | 
103 I 104 | 105 j 106 | 107 j 
ATT j GGT j TAC j GCG j TGG j - 

v|v|v|i|v|g|a| 

110 | 111 | 112 j 113 | 114 j 115 j 116 j 
GTG j GTG j GTT | ATC | GTT j GGT j GCT j 



377 



398 



425 



BstX I 



| Nco I | 



412 



10 



15 



20 



25 



30 



Table 102b: Annotated Sequence 
after insertion of Sai l linker 
(continued) 



| T | I | G | I | 
| 117 | 118 | 119 | 120 | 

| ACC | ATC | GGG | ATC | - 437 



|K|L|F|K|K|F|T|S|K|A| 
| 121 | 122 j 123 j 124 j 125 | 12 6 j 12 7 j 128 j 12 9 j 13 0 j 
| AAA | CTG | TTC | AAG j AAG j TTT j ACT | TCG j AAG | GCG | - 4 67 

|Asu Il| 

I s | . | . | . | 

j 131| 132 | 133 | 134 | 

|TCT|TAA|TGA|TAG| GGTTACC - 4 86 

BstE II 



AGTCTA AGCCCGC CTAATGA GCGGGCT TTTTTTTT- 521 
terminator 



aTCGA GACctgca GGTCGACC ggcatgc-3 1 

| Sail | 

Note the following enzyme equivalences, 

Xma III = Eag I Acc III = BspM II 

Dra II = EcoO109 I Asu II = BstB I 



413 



Table 102 : Annotated Sequence 
of osp-ipbd gene 
(continued) 

5 Table 102c : Calculated properties of Peptide 
For the apoprotein 



Molecular weight of peptide = 16192 

10 Charge on peptide = 9 

[A+G+P] = 3 6 

[C+F+H+I+L+M+V+W+Y] = 4 8 

[D+E+K+R+N+Q+S+T+ . ] = 48 

15 For the mature protein 

Molecular weight of peptide = 13339 

Charge on peptide = 6 

[A+G+P] = 31 

20 [C+F+H+I+L+M+V+W+Y] = 37 

[D+E+K+R+N+Q+S+T+. ] = 41 



Table 102d: Codon Usage 



25 



30 



35 



40 



45 



First 

Base 

t 



Second Base 



3 
5 
0 
1 

1 
1 
0 
5 

1 
5 
0 
4 

4 
1 
2 
2 



4 
1 
0 
2 

1 
1 
2 
2 

2 
5 
0 
0 

9 
5 
1 
5 



2 
4 
0 
0 

0 
0 

1 
1 

2 
2 
5 
7 

4 
0 
2 
2 



1 
5 
0 
1 

4 
2 
0 
0 

0 

1 

0 
0 

6 
2 
0 
2 



Third base 

t 

c 

a 

9 

t 
c 
a 
9 

t 

c 
a 

g 

t 

c 
a 

g 



414 



Table 102e: Amino-acid frequency 



Encoded polypeptide 

5 



15 



AA 


# 


AA 


# 


A 


20 


C 


6 


F 


8 


G 


10 


K 


12 


L 


8 


P 


6 


Q 


2 


T 


7 


V 


9 




1 






Mature 


protein 




AA 


# 


AA 


# 


A 


16 


C 


6 


F 


7 


G 


10 


K 


9 


L 


4 


P 


5 


Q 


2 


T 


6 


V 


5 



AA # AA # 

D 4 E 4 

H 0 I 6 

M 4 N 4 

R 6 S 8 

W 1 Y 6 



AA # AA # 

D 4 E 4 

H 0 I 6 

M 2 N 4 

R 6 S 5 

W 1 Y 6 
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Table 102f : Enzymes used to manipulate BPTI-gp8 fusion 



SacI 
Aflll 
5 Nhel 
Nrul 
Kpn l 

Acc III = BspM II 
Ava l 
10 Xhol 
PflMI 
Bss HII 
Apa l 

Drall = Ecol09I 
15 StuI 

Acc I 

Xcal 

Esp l 

Xma lll 
2 0 SphI 

Bbe l 

Narl 

Sfil (SEQ ID NO: 151) 
Hin di I I 
25 Bst XI (SEQ ID NO: 193) 
Ncol 

AsuII = BstBI 

Bst EII 

Sail 



G AGCT 1 C 
C | TTAA Q 
G | CTAG C 
TCG_[CGA 
G GTAC 1 C 
T CCGGA 



(SEP ID NO: 88: 



(Same as PssI) 



C 1 yCGr G 
C 1 TCGA G 
CCAn nnn | nTGG 
G 1 CGCG C 
G GGCC 1 C 
rG GnC [ Cy 
AGG_[CCT 
GT | mkA C 
GTA_[TAC 
GC 1 TnA GC 

C [ GGCC G (Supplier ?) 
G CATG 1 C 

G GCGC 1 C (Supplier ?) 

GG CG 1 CC 

GGCC nnnn j nGGCC 

A 1 AGCT T 

C CAnnnnn | nTGG 

C 1 CATG G 

TT | CGA A 

G 1 GTnAC C 

G I TCGAC 
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Table 103 : Annotated Sequence of osp-ipbd gene 
DNA sequence = SEQ ID NO: 194 
Protein sequence = SEQ ID NO: 191 



5 Underscored bases indicate sites of overlap between annealed 
synthetic duplexes . 

5 ' - 

10 /GGC tttaca CTTTAT , GCTTCCGGCTCG tataat GTGTGG- 

lacUVB 



aATTGTGAGCGcTcACAATT- 
15 lacO-symm operator 



gagctc AG (G) /AGG CttaCT- 

Sac I Shine -Dalgarno seq. 



|fM | k |k|s|l|v|l|k|a|s| 

I 1 I 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 
25 | ATG I AAG , j AAA j TCT j CTG j GTT | CTT j AAG | GCT | AGC | - 

| Afl II| Nhe I | 



|V|A|V|A|T|L|V|P|M| L | 
30 | 11 | 12 | 13 j 14 | 15 | 16 | 17 j 18 j 19 j 20 l 
| GTT | GCT j GTC j GCG j ACC | CTG j GTA j CCT j ATG j T /TG] - 



20 



| Nru I | 



I Kpn I | 



35 



| S | F | A | R | P | D | F | C 
j 21 j 22 j 23 | 24 | 25 j 26 j 27 | 28 
iTCClTTClGCTlCG .T j CCG | GAT j TTC j TGT 



t | AccIII 1 



L | E | 
29 | 30 ( 
CTC j GAG j - 
Ava I | 



M13/BPTI Jnct 



Xho I | 
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Table 103 : Annotated Sequence 
of osp-ipbd gene 
(continued) 



10 



15 



20 



25 



30 



35 



40 



P | P | Y | T | G | P 
31 | 32 | 33 I 34 j 35 j 36 
CCA | CCA j TAC | ACT | GGG | CCC 
PflM I 



1 




1 A p a 


I j j 


| Dra 


II 1 


| Pss 


I 



C | K | A | R | 
37 j 38 j 39 | 40| 
TGC | AAA j GCG j CGC j 
BssH II 



| I | I | R | Y j F J Y 
| 41 j 42 | 43 j 44 | 45 | 46 
| ATC | ATC j CG /ClTAT|TTC|TAC 



|G|L|C|Q|T|F 
| 51 j 52 j 53 j 54 j 55 j 56 

A j GGC j CTG | TGC j CAG | ACC | TTT 

| Stu I 1 



| C | R | A | K | R | N 
j 61 | 62 j 63 j 64 | 65 j 66 
j TGC j CGT | GCT | AAG | CGT | / AAC 
I Esp I | 

| S | A | E | D | C | M 
| 70 j 71 j 72 j 73 | 74 | 75 
[TCG , | GCC | GAA j GAT j TGC j ATG 
Xma III | | Sph 



N | A I K | A 
47 | 48 j 49 | 50 
AAT GC,T AAA GC 



V | Y | G | G | 
57 | 58 | 59 | 60 j 
GTA j TAC j GGT j GGT | 
Acc I 



Xca I 



N | F j K | 
67 | 68 j 69 j 
AAC TTT AAA 



R | T | C | G | 
76 | 77 j 78 | 79| 
CGT j ACC j TGC j GGT j 



BPTI/M13 boundary 

g|a|a|e|g|d|d|p|a|k|a| a | 

80 | 81 | 82 j 83 | 84 | 85 | 86 j 87 j 88 j 89 | 90 j 91 j 
GGC j GCC j GCT | GAA j GGT j GAT | GAT | CCG j GCC j AAG j GCG j G /CC [ 
Bbe I | j Sfi I [ 



Nar I | 



45 | F | N | S | L | Q | A | S | A | T | 
j 92 | 93 | 94 j 95 j 96 j 97 j 98 j 99 j 100 | 
| TTC 1 AAT | TCT | CTG | C , AA j GCT | TCT j GCT | ACC | - 
" Hind 3 
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Table 103 : Annotated Sequence 
of osp-ipbd gene 
(continued) 



|E|Y|I|G|Y|A|W| 
j 101 j 102 I 103 j 104 | 105 | 106 | 107 | 
| GAG j TAT | ATT j GGT j TAC j GCG j TGG j - 



10 

| A | M | V | V | V | I | V | G | A | 

j 108 j 109 I 110 j 111 | 112 | 113 j 114 j 115 j 116 j 
| GCC j ATG j GTG j GTG j GTT j AT /C | GTT | GGT | GCT | - 

| BstX I |_ 

15 1 Nco I | 

I T | I | G | I | 
I 117 | 118 | 119 | 120 | 
_[ACC , | ATC | GGG j ATC j - 



|k|l|f|k|k|f|t|s|k|a| 

j 121 j 122 I 123 j 124 j 125 j 126 j 127 j 128 j 129 j 130 j 
j AAA | CTG j TTC | AAG j AAG j TTT j ACT j TCG j AAG j GCG j - 
25 j Asu 1 1 | 



I s | . | ■ | . | 

I 131| 132 | 133 | 134 | 

| TCT [ TAA | TGA j TAG | GGTT A/CC- 

3 0 BstE 1 1 



AGTCTA AGCCC ,GC CTAATGA GCGGGCT TTTTTTTT- 
terminator 

35 

a / (TCGA) , -3 1 
(Sal I) 



CD rd 
U 

fd O 
4-> 

CO 
CD 

CO C 

rd o 

CO 
CD 

!h 
o 
u 



CD 



T5 



CO 
M 



4-> 

c oS 

cj co £ 

O CJ cq 

r-H O 

CP TJ M 
CD 



cu 



rd 
U 

4-> 



4-> 

o 

CD 



co 

4-> 



c o 

-H 
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Table 107: In vitro transcription/translation 
analysis of vector-encoded 
signal : :BPTI :: mature VIII protein species 

5 31 kd species 3 14.5 kd species b 

No DNA (control) - c 

pGEN-3Zf(-) + 

pGEM-MB16 + 

pGEM-MB2 0 + + 

10 pGEM-MB2 6 + + 

pGEM-MB4 2 + + 

pGEM-MB4 6 ND ND 

Notes : 

15 a.) pre-beta-lactamase , encoded by the amp 

( bla ) gene. 

b. ) pre-BPTI/VIII peptides encoded by the 
synthetic gene and derived constructs. 

c. ) - for absence of product; + for presence of 
2 0 product; ND for Not Determined. 
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Table 108: Western analysis 3 of in vivo 
expressed signal : :BPTI :: mature VIII protein species 



A) expression in strain XLl-Blue 



signal 14,5 kd species b 12 kd species 0 



pGEM-3Zf (-) - 

pGEM-MB16 VIII 

pGEM-MB2 0 VIII + + 

10 pGEM-MB2 6 VIII + + + +/■ 

pGEM-MB42 phoA + + + 



B) expression in strain SEF 1 

signal 14 . 5 kd species b 12 kd species 0 



pGEM-MB42 phoA +/- + + + 

Notes : 

a) Analysis using rabbit anti-BPTI polyclonal 
20 antibodies and horse-radish- peroxidase-conjugated 

goat ant i -rabbit IgG antibody. 

b) pro-BPTI/VIII peptides encoded by the 
synthetic gene and derived constructs. 

c) processed BPTI/VIII peptide encoded by the 
25 synthetic gene. 

d) not present - 

weakly present +/- 

present + 

strong presence .... ++ 

3 0 very strong presence +++ 
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Table 109: M13 gene III (SEQ ID NO: 229) 
1579 5 ! -GT GAAAAAATTA TTATTCGCAA TTCCTTTAGT 

1611 TGTTCCTTTC TATTCTCACT CCGCTGAAAC TGTTGAAAGT 

1651 TGTTTAGCAA AACCCCATAC AGAAAATTCA TTTACTAACG 

5 16 91 TCTGGAAAGA CGACAAAACT TTAGATCGTT ACGCTAACTA 

1731 TGAGGGTTGT CTGTGGAATG CTACAGGCGT TGTAGTTTGT 

1771 ACTGGTGACG AAACTCAGTG TTACGGTACA TGGGTTGCTA 

1811 TTGGGCTTGC TATCCCTGAA AATGAGGGTG GTGGCTCTGA 

1851 GGGTGGCGGT TCTGAGGGTG GCGGTTCTGA GGGTGGCGGT 

10 18 91 ACTAAACCTC CTGAGTACGG TGATACACCT ATTCCGGGCT 

1931 ATACTTATAT CAACCCTCTC GACGGCACTT ATCCGCCTGG 

1971 TACTGAGCAA AACCCCGCTA ATCCTAATCC TTCTCTTGAG 

2011 GAGTCTCAGC CTCTTAATAC TTTCATGTTT CAGAATAATA 

2 051 GGTTCCGAAA TAGGCAGGGG GCATTAACTG TTTATACGGG 

15 2 0 91 CACTGTTACT CAAGGCACTG ACCCCGTTAA AACTTATTAC 

2131 CAGTACACTC CTGTATCATC AAAAGCCATG TATGACGCTT 

2171 ACTGGAACGG TAAATTCAGA GACTGCGCTT TCCATTCTGG 

2211 CTTTAATGAG GATCCATTCG TTTGTGAATA TCAAGGCCAA 

22 51 TCGTCTGACC TGCCTCAACC TCCTGTCAAT GCTGGCGGCG 
2 0 22 91 GCTCTGGTGG TGGTTCTGGT GGCGGCTCTG AGGGTGGTGG 

2331 CTCTGAGGGT GGCGGTTCTG AGGGTGGCGG CTCTGAGGGA 

23 71 GGCGGTTCCG GTGGTGGCTC TGGTTCCGGT GATTTTGATT 
2411 ATGAAAAGAT GGCAAACGCT AATAAGGGGG CTATGACCGA 

24 51 AAATGCCGAT GAAAACGCGC TACAGTCTGA CGCTAAAGGC 

2 5 24 91 AAACTTGATT CTGTCGCTAC TGATTACGGT GCTGCTATCG 

2531 ATGGTTTCAT TGGTGACGTT TCCGGCCTTG CTAATGGTAA 

2 571 TGGTGCTACT GGTGATTTTG CTGGCTCTAA TTCCCAAATG 

2 611 GCTCAAGTCG GTGACGGTGA TAATTCACCT TTAATGAATA 

2 651 ATTTCCGTCA ATATTTACCT TCCCTCCCTC AATCGGTTGA 

3 0 2 6 91 ATGTCGCCCT TTTGTCTTTA GCGCTGGTAA ACCATATGAA 

2 731 TTTTCTATTG ATTGTGACAA AATAAACTTA TTCCGTGGTG 

2 771 TCTTTGCGTT TCTTTTATAT GTTGCCACCT TTATGTATGT 

2 811 ATTTTCTACG TTTGCTAACA TACTGCGTAA TAAGGAGTCT 

2 851 TAATCATGCC AGTTCTTTTG GGTATTCCGT 
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Table 110: Introduction of Nar l into gene III 

DNA sequence: SEQ ID NO: 23 0 
Protein sequence: SEQ ID NO: 231 

5 

A) Wild-type III , portion encoding the signal 
peptide 

MKKLLFAI PL 
10 123456789 10 

157 9 5 ! -GTG AAA AAA TTA TTA TTC GCA ATT CCT 

TTA 



15 / Cleavage site 

VVPFYSHS^AETV 
11 12 13 14 15 16 17 18 19 20 21 22 
160 9 GTT GTT CCT TTC TAT TCT CAC TCC GCT GAA ACT GTT 
3 » 



20 



30 



DNA sequence: SEQ ID NO: 232 
Protein sequence: SEQ ID NO: 233 

B) III , portion encoding the signal peptide with 



25 Narl site 



mkkllfalpl 
123456789 10 
1579 5 1 -gtg aaa aaa tta tta ttc gca att cct tta 



/ cleavage site 

vvpfysGAaetv 
11 12 13 14 15 16 17 18 19 20 21 22 
35 1609 gtt gtt cct ttc tat tct GGc Gcc get gaa act gtt 



3 
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Table 111: IIIsp : : bpti : : matureIII fusion gene. 
DNA sequence: SEQ ID NO: 2 34 
Protein sequence: SEQ ID NO: 235 

5 mkkllfalpl 

123456789 10 
5 1 -gtg aaa aaa tta tta ttc gca att cct tta 
| < gene III signal peptide 

10 t cleavage site 

vvpfysGA 
11 12 13 14 15 16 17 18 
gtt gtt cct ttc tat tct GGc Gcc 

>| 

15 

|R|P|D|FjC|L|E| 

I 19 1 2 °i 21 I 22 1 23 t 24 l 25 l 

I CGT j CCG j GAT j TTC j TGT j CTC | GAG j - 
j | AccIII | | Ava I | 

20 M13/BPTI Jnct | Xho I | 

|p|p|y|t|g|p|c|k|a|r| 

j 26 | 27 | 28 | 29 j 30 j 31 j 32 j 33 j 34 | 35 | 
| CCA | CCA | TAC j ACT | GGG | CCC | TGC | AAA | GCG j CGC j - 

25 | PflM I [ j | jBssH II | 

Apa I 



| Dra 




| Pss 


i i 



30 |i|i|r|y|f|y|n|a|k|a| 

j 36 1 37 j 38 | 39| 40 j 41 j 42 j 43 j 44 j 45 j 
j ATC | ATC j CGC | TAT | TTC j TAC j AAT j GCT j AAA j GC j - 

|G|L|C|Q|T|F|V|Y|G|G| 
35 j 46 j 47| 48| 49 j 50 j 51 j 52 | 53 j 54 j 55 j 

A | GGC | CTG | TGC | CAG | ACC | TTT | GTA | TAC | GGT | GGT | • 
| Stu I 1 | Acc I | 

I Xca I I 



40 |c|r|a|k|r|n|n|f|k| 

I 56 I 57 I 58 | 59 j 60 j 61 j 62 j 63 j 64 | 
j TGC | CGT | GCT | AAG j CGT j AAC j AAC j TTT j AAA j 
I Esp I | 
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Table 111, continued 



s|a|e|d|c|m|r|t|c|g| 

65 I 66 | 67 | 68 j 69 | 70 | 71 1 72 j 73 j 74 j 
TCG | GCC j GAA j GAT | TGC j ATG j CGT j ACC | TGC j GGT | 
Ixmalll | | Sph I | 



10 



15 



BPTI/M13 boundary 



At 



G 

75 j 76 | 
GGC | GCC | 
Bbe I | 
Nar I I 



GAaetves 
77 78 79 80 81 82 83 84 
2 0 GGc Gcc get gaa act gtt. GAA AGT 

1651 TGTTTAGCAA AACCCCATAC AGAAAATTCA TTTACTAACG 

1691 TCTGGAAAGA CGACAAAACT TTAGATCGTT ACGCTAACTA 

1731 TGAGGGTTGT CTGTGGAATG CTACAGGCGT TGTAGTTTGT 

2 5 17 71 ACTGGTGACG AAACTCAGTG TTACGGTACA TGGGTTCCTA 

1811 TTGGGCTTGC TATCCCTGAA AATGAGGGTG GTGGCTCTGA 

18 51 GGGTGGCGGT TCTGAGGGTG GCGGTTCTGA GGGTGGCGGT 

18 91 ACTAAACCTC CTGAGTACGG TGATACACCT ATTCCGGGCT 

1931 ATACTTATAT CAACCCTCTC GACGGCACTT ATCCGCCTGG 

3 0 1971 TACTGAGCAA AACCCCGCTA ATCCTAATCC TTCTCTTGAG 

2 011 GAGTCTCAGC CTCTTAATAC TTTCATGTTT CAGAATAATA 

2 0 51 GGTTCCGAAA TAGGCAGGGG GCATTAACTG TTTATACGGG 

2 0 91 CACTGTTACT CAAGGCACTG ACCCCGTTAA AACTTATTAC 

2131 CAGTACACTC CTGTATCATC AAAAGCCATG TATGACGCTT 

3 5 2171 ACTGGAACGG TAAATTCAGA GACTGCGCTT TCCATTCTGG 

2211 CTTTAATGAG GATCCATTCG TTTGTGAATA TCAAGGCCAA 

22 51 TCGTCTGACC TGCCTCAACC TCCTGTCAAT GCTGGCGGCG 

22 91 GCTCTGGTGG TGGTTCTGGT GGCGGCTCTG AGGGTGGTGG 

2331 CTCTGAGGGT GGCGGTTCTG AGGGTGGCGG CTCTGAGGGA 

4 0 2 3 71 GGCGGTTCCG GTGGTGGCTC TGGTTCCGGT GATTTTGATT 

2411 ATGAAAAGAT GGCAAACGCT AATAAGGGGG CTATGACCGA 

2451 AAATGCCGAT GAAAACGCGC TACAGTCTGA CGCTAAAGGC 
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Table 111, continued 

24 91 AAACTTGATT CTGTCGCTAC TGATTACGGT GCTGCTATCG 

2 531 ATGGTTTCAT TGGTGACGTT TCCGGCCTTG CTAATGGTAA 

5 2571 TGGTGCTACT GGTGATTTTG CTGGCTCTAA TTCCCAAATG 

2 611 GCTCAAGTCG GTGACGGTGA TAATTCACCT TTAATGAATA 

2 651 ATTTCCGTCA ATATTTACCT TCCCTCCCTC AATCGGTTGA 

2 6 91 ATGTCGCCCT TTTGTCTTTA GCGCTGGTAA ACCATATGAA 

2 731 TTTTCTATTG ATTGTGACAA AATAAACTTA TTCCGTGGTG 

10 2 771 TCTTTGCGTT TCTTTTATAT GTTGCCACCT TTATGTATGT 

2 811 ATTTTCTACG TTTGCTAACA TACTGCGTAA TAAGGAGTCT 

2 851 TAATCATGCC AGTTCTTTTG GGTATTCCGT 
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Table 112 : Annotated Sequence of 
Ptac: :RBS (GGAGGAAATAAA) : : (SEQ ID NO: 241) 
VIII -signal : : mature -bp ti : : mature -VI I I -coat -protein 

gene (SEQ ID NO: 23 6) 
Protein sequence: SEQ ID NO: 153 

5 1 -GGATCC actccccatcccc 



BamHI 



ctg TTGACA attaatcatcgGCTCG tataat GTGTGG- 
-35 tac -10 



15 



aATTGTGAGCGcTcACAATT - 
lacO-symm operator 



GAGCTC T ggagga 

SacI Shine -Dalgarno seq. 



AATAAA- 



20 |fM | K | K | S | L | V | L | K | A | S 
|l|2|3|4|5|6|7|8|9| 10 
| ATG | AAG | AAA | TCT | CTG j GTT j CTT | AAG | GCT j AGC 

Afl II Nhe I 



25 |v|a|v|a|t|l|v|p|m|l 

j 11 | 12 | 13 | 14 | 15 j 16 j 17 j 18 j 19 | 20 
j GTT | GCT j GTC | GCG | ACC | CTG j GTA j CCT | ATG | TTG | 
1 Nru I | | Kpn I | 



30 



35 



40 



| S | F | A 
| 21 | 22 | 23 
| TCC | TTC | GCT 

M13/BPTI Jnct 



R j P I D | F j C 

24| 25| 2 6 | 27| 28 

CGT j CCG j GAT | TTC j TGT 
lAccIIll 



Li I E 

29 | 30 
CTC j GAG | 
Ava I 
Xho I 



|p|p|y|t|g|p|c|k|a|r 

j 31 j 32 | 33 | 34| 35 | 36 | 37 j 38 | 39 j 40 
j CCA j CCA | TAC j ACT j GGG | CCC | TGC | AAA j GCG | CGC | 
| PflM I [ I I |BssH II 

I Apa I M 

1 Dra II | 



Pss I 



1 
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Table 112 : Annotated Sequence of 
Ptac : : RBS (GGAGGAAATAAA) (SEQ ID NO:241) : : 
VIII -signal : : mature -bpti : : mature -VI I I - coat - protein gene 

(continued) 

5 



I I | I | R | Y | 


F 




N | A 


K 


A 


j 41| 42 | 43 | 44| 


45 


46 


47 | 48 


49 


50 


j ATC | ATC | CGC j TAT | 


TTC 


TAC 


AAT | GCT 


AAA 


GC 


| G | L | C | Q | 


T 


F 


V | Y 






j 51 | 52 | 53 | 54 | 


55 


56 


57 j 58 


59 


60 


A | GGC | CTG j TGC j CAG j 


ACC 


TTT 


GTA | TAC 


GGT 


GGT 


| Stu I | 






Acc I 












Xca I 







15 

|c|r|a|k|r|n|n|f|k| 

| 61 | 62 | 63 j 64 j 65 j 66 [ 67 j 68 j 69 j 
| TGC | CGT j GCT | AAG | CGT j AAC j AAC | TTT | AAA j - 
1 Esp I | 

20 

|s|a|e|d|c|m|r|t|c|g| 

| 70| 71 I 72) 73 I 74 | 75 | 76 j 77 j 78 j 79 j 
| TCG | GCC | GAA j GAT j TGC | ATG | CGT | ACC | TGC j GGT j - 
IXmalll | | Sph I [ 

25 

BPTI /Ml 3 boundary 
|g|a|a|e|g|d|d|p|a|k|a|a| 

I 80 j 81 82 | 83 | 84 j 85 | 86 j 87 j 88 | 89 | 90 j 91 | 
3 0 | GGC | GCC GCT | GAA | GGT | GAT | GAT | CCG | GCC | AAG j GCG j GCC j - 

| Bbe I | | Sfi I [ 

| Nar I 1 

|F|N|S|L|Q|A|S|A|T| 
35 | 92 | 93 | 94 | 95 | 96 j 97 j 98 j 99|l00| 

j TTC j AAT | TCT j CTG j CAA j GCT | TCT | GCT | ACC j - 

| Hind 3 [ 

|e|y|i|g|y|a|w| 

40 | 101 | 102 | 103 | 104 | 105 j 106 j 107 | 
j GAG | TAT | ATT | GGT j TAC j GCG j TGG | - 
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Table 112 : Annotated Sequence of 
Ptac : :RBS ( GGAGGAAATAAA ) ^EQ^J[D^^24l2j : 
VHI-signal : : mature -bpti : : mature -VIII -coat -protein gene 

(continued) 

5 

|A|M|V|V|V|I|V|G|A| 
| 108 | 109 I 110 | 111 | 112 | 113 | 114 j 115 j 116 j 
| GCC | ATG | GTG | GTG j GTT j ATC j GTT j GGT j GCT j - 

| BstX I 1_ 

10 I Nco I I 



15 



20 



25 



| T | I | G | I | 
| 117 | 118 | 119 | 120 | 
| ACC j ATC | GGG j ATC | - 

|K|L|F|K|K|F|T|S|K|A| 
| 121 | 122 j 123 | 124 j 12 5 j 12 6 | 12 7 | 12 8 j 12 9 j 13 0 j 
| AAA j CTG | TTC j AAG j AAG j TTT | ACT j TCG j AAG j GCG j 

|Asu III 

I S | . | . | . | 

j 131 | 132 | 133 | 134 | 
| TCT | TAA | TGA j TAG | GGTTACC - 

Bst E II 

AGTCTA AGCCCGC CTAATGA GCGGGCT TTTTTTTT- 
terminator 



30 aTCGA GACctgca GGTCGACC ggcatgc-3 » 

1 Sail 1 
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Table 113 : Annotated Sequence of 
pGEM-MB42 comprising Ptac : : RBS (GGAGGAAATAAA) a : : 
phoA- signal : : mature -bpti : : mature -VI I 1 - coat -protein 
a SEQ ID NO: 241 
5 DNA sequence: SEQ ID NO: 242 

Protein sequence: SEQ ID NO: 24 0 

5 1 -GGATCC actccccatcccc 



10 BamHI 



15 



ctg TTGACA attaatcatcgGCTCG tataat GTGTGG- 
-35 tac -10 



aATTGTGAGCGcTcACAATT - 
lacO-symm operator 



20 | M | K | Q | S | T | 

| 1 | 2 | 3 | 4 | 5 | 
GAGC T C C ATGGGAGAAAATAAA | ATG | AAA j CAA | AGC | ACG | - 
| SacI | |< phoA signal peptide 



25 



30 



|I|A|L|L|P|L|L|F|T|P|V|T| 
| 6 | 7 j 8 j 9 j loj llj 12 j 13 I 14 j 15 j 16 | 17 | 
| ATC | GCA | CTC j TTA | CCG j TTA | CTG j TTT j ACC | CCT j GTG j ACA j 
phoA signal continues 



(There are no residues 20-23.) 



35 



40 



I K | A 
| 18 | 19 
| AAA|GCC 
phoA signal -> 
phoA/BPTI Jnct 

l< 



R | P | D | F | C 
24 | 25 | 26 j 27| 28 
CGT | CCG | GAT j TTC j TGT 
1 AccIII | 



BPTI insert 



L | E | 
29 | 30 | 
CTC j GAG j 
Ava I 



Xho I 
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Table 113 : Annotated Sequence of 
Ptac : : RBS (GGAGGAAATAAA) (SEQ | ^P l t Jm ^2ll^lL : '' 
phoA- signal : : mature -bpti : : mature- VlVr^coaF-protein 

gene (continued) 



10 



15 



20 



25 



30 



P | P | Y | T | G | P 
31 | 32 | 33 | 34 | 35 j 36 
CCA | CCA j TAC | ACT | GGG j CCC 

| PflM 1 L 

I Apa I 



C | K | A | R | 
37 j 38 j 39 j 40| 
TGC j AAA j GCG | CGC j - 
[BssH II 1 



Dra 


II 1 


| Pss 


I | 



|I|I|R|Y|F|Y 
j 4l| 42 j 43 | 44j 45 | 46 
| ATC | ATC | CGC j TAT j TTC | TAC 

|G|L|C|Q| T|F 
| 51 | 52 | 53 j 54 j 55 j 56 

A | GGC | CTG | TGC | CAG j ACC j TTT 

I Stu I I 



| C | R | A | K | R | N 
j 61 j 62 j 63 | 64 j 65 j 66 
| TGC | CGT j GCT j AAG j CGT | AAC 
I Esp I 1 



| S | A | E | D | C | M | R | T 
| 70 j 7l| 72 | 73 | 74 j 75 j 76 | 77 
j TCG | GCC j GAA j GAT j TGC j ATG j CGT | ACC 

IXmalll 1 1 Sph I | 
BPTI insert 



N j A 
47 | 48 
AAT j GCT 

V | Y 
57 | 58 
GTA | TAC 
Acc I 



Xca I 



N | F 
67 | 68 
AAC j TTT 



K | A | 

49| 50 | 

AAA j GC j 

G | G | 

59 | 60| 

GGT GGT 



K I 
69 | 
AAA 



C | G | 
78 | 79| 
TGC | GGT | 



35 



40 



BPTI /Ml 3 boundary 



G | A 
80 j 81 
GGC | GCC 
Bbe I 



v 



Nar I 



a|e|g|d|d|p|a|k|a|a| 

82 j 83 j 84 | 85 j 86 | 87 j 88 j 89 j 90 j 91 | 
GCT | GAA | GGT j GAT j GAT j CCG j GCC j AAG j GCG | GCC | 

i Sfi I I 



- BPTI--> < mature gene VIII coat protein 
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Table 113 : Annotated Sequence of 

Ptac : : RBS (GGAGGAAATAAA) (S Efl N O : 241) : : 

phoA- signal : : mature -bpti : : mature -VI I I -coat -protein gene 

(continued) 

|F|N|S|L|Q|A|S|A|T| 
j 92 I 93 | 94 j 95 1 96 j 97 j 98 j 99|l00| 
| TTC | AAT | TCT | CTG j CAA | GCT j TCT j GCT j ACC | - 

I Hind 3 I 



|E|Y|I|G|Y|A|W| 
|101|102|103|104|105|106|107| 
| GAG j TAT | ATT j GGT | TAC j GCG | TGG j - 

15 |A|M|V|V|V|I|V|G|A| 
|108|109|110|111|112|113|114|115|116| 
j GCC j ATG j GTG j GTG | GTT | ATC j GTT j GGT | GCT j 
1 BstX I | 



Nco I 



20 

| T | I | G | I | 
|117|118|119|120| 
| ACC j ATC | GGG j ATC j - 

25 |k|l|f|k|k|f|t|s|k|a| 

| 121 | 122 j 123 | 124 | 12 5 | 12 6 | 12 7 j 12 8 | 12 9 j 13 0 | 
| AAA | CTG j TTC j AAG j AAG j TTT | ACT j TCG j AAG j GCG j 

| Asu II 1 

30 | S | . | . | . | 

| 131 | 132 | 133 | 134 | 

|tct|taa|tga|tag| ggttacc - 

Bst E II 

3 5 AGTCTA AGCCCGC CTAATGA GCGGGCT TTTTTTTT- 
terminator 



aTCGA GACctgca GGTCGAC-3 ' 

| Sail | 
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Table 114: Neutralization of Phage Titer Using 
Agarose- immobilized Anhydro -Trypsin 



Percent Residual Titer 
As a Function of Time (hours) 



Phage Type 


Addition 


1 


2 


4 


MK-BPTI 


5 /xl IS 


99 


104 


105 




2 /xl IAT 


82 


71 


51 




5 fil IAT 


57 


40 


27 




10 Ml IAT 


40 


30 


24 


MK 


5 /il IS 


10 
6 


96 


98 




2 Ml IAT 


97 


103 


95 




5 Ml IAT 


11 


111 


96 




10 Ml IAT 


0 

99 


93 


106 



5 

Legend : 

IS - Immobilized streptavidin 
IAT = Immobilized anhydro- trypsin 
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Table 115: Affinity Selection of MK-BPTI Phage 
on Immobilized Anhydro- Trypsin 

Percent of Total Phage 

5 Phage Type Add i t i on Re c o ve r e d in Elution 

Buffer 





MK-BPTI 


5 


Ml 


IS 


«l a 






2 


Ml 


I AT 


5 






5 


Ml 


IAT 


20 


10 




10 


Ml 


I AT 


50 




MK 


5 


Ml 


IS 


<<l a 






2 


Ml 


IAT 


«1 






5 


Ml 


IAT 


<<1 


15 




10 


Ml 


IAT 


<<1 



20 



Legend : 



IS = Immobilized streptavidin 
IAT = Immobilized anhydro- trypsin 
a not detectable . 
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Table 130: Sampling of a Library encoded by (NNK) 6 



A. Numbers of hexapeptides in each class 



couaj. 




64, 000, 000 


stop-free 


sequences . 


ot can be 


one 


of [WMF 


Y 


C 


I K D E 


N H Q] 




3> can be 


one 


of [P T A 


V 


G] 








Q can be 


one 


of [S L R] 












otototototot 




2985984 . 






<£>ototototot 


= 


7464960 . 






4478976. 








= 


7776000 . 


&Qotototot 


= 


9331200 . 






QQaaaa 




2799360 . 






4320000 . 










7776000 . 


QQQaota 




4665600 . 










933120 . 






1350000 . 










3240000 . 






2916000 . 










1166400 . 






174960 . 






cy 




225000 . 






675000 . 






$><i>$>QQaf 




810000 . 


4>3>QQQo? 




486000 . 










145800 . 






17496 . 










5625. 






56250 . 










84375 . 






67500 . 










30375 . 






7290 . 










729. 



<i>$>QQo!af, for example, stands for the set of peptides having two 
amino acids from the of class, two from <£>, and two from Q 
arranged in any order. There are, for example, 72 9 = 3 6 
sequences composed entirely of S, L, and R. 
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Table 130: Sampling of a Library encoded by (NNK) 6 

(continued) 

B. Probability that any given stop-free DNA sequence will 
5 encode a hexapeptide from a stated class. 







P 


% of class 


QfOfOfOfOfQ! . . . 


3 


.364E-03 


(1 . 13E-07) 


^aaaaa . . . 


1 


. 682E-02 


(2 .25E-07) 


QofQfcma . . . 


1 


. 514E-02 


(3 .38E-07) 


$<£(x<ya<y . . . 


3 


. 505E-02 


(4 .51E-07) 


^QaofOfo; . . . 


6 


.308E-02 


(6 . 76E-07) 


QQaofQfC* . . . 


2 


.839E-02 


(1 . OlE-06) 


$>$$0£0ta . . . 


3 


. 894E-02 


(9 . 01E-07) 


$$>Qotcta . . . 


1 


. 051E-01 


(1.35E-06) 


<i>QQaa!a! . . . 


9 


.463E-02 


(2 . 03E-06) 


QQQQfQ!Qf . - - 


2 


. 839E-02 


(3 . 04E-06) 


§>&$&ota . . . 


2 


.434E-02 


(1 . 80E-06) 


<£<i><i>Qaaf . . . 


8 


. 762E-02 


(2 . 70E-06) 


3>3>QQa?Qr . . . 


1 


. 183E-01 


(4 . 06E-06) 


^QQQaa. . . 


7 


. 097E-02 


(6 . 08E-06) 


QQQQofQf. . . 


1 


. 597E-02 


(9 . 13E-06) 




8 


. 113E-03 


(3 .61E-06) 


&<&&<$Qa . . . 


3 


. 651E-02 


(5 .41E-06) 


®$$QQa. . . 


6 


. 571E-02 


(8 . 11E-06) 


<i>3>QQQa. . . 


5 


. 914E-02 


(1 .22E-05) 


<i>Q^QQof. . . 


2 


. 661E-02 


(1 . 83E-05) 


QQQQQa. . . 


4 


. 790E-03 


(2 . 74E-05) 


<|>e|>3><|)<|><|> ... 


1 


. 127E-03 


(7.21E-06) 




6 


. 084E-03 


(1 . 08E-05) 


$>$>$>$>QQ. . . 


1 


.369E-02 


(1 . 62E-05) 


<i>4>3>QQQ. . . 


1 


. 643E-02 


(2 .43E-05) 




1 


. 109E-02 


(3 . 65E-05) 


4>^QQQQ. . . 


3 


. 992E-03 


(5.48E-05) 




5 


. 988E-04 


(8 .21E-05) 
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Table 130: Sampling of a Library encoded by (NNK) 6 

(continued) 

C. Number of different stop- free amino-acid sequences in 
5 each class expected for various library sizes 



Library 


size 


1.0000E+06 










total = 


9. 


7446E+05 


% sampled = 1. 


52 






Class 




Number 


% 


Class 




Number 


% 


QfOfQfQfQfQf . . 




3362 .6 ( 


-1) 


<£>aceaafQ( . . . 




16803 .4 ( 


.2) 


QofQfQfQfQf. . 




15114 . 6 ( 


.3) 


<$>$ototaa . . . 




34967 . 8 ( 


.4) 






62871.1 ( 


.7) 


QQcxaaa. . . 




28244 .3 ( 


1.0) 


<£<M>aao! . . 




38765 . 7 ( 


.9) 






104432 .2 ( 


1-3) 


$>QQotaot . . 




93672 . 7 ( 


2 . 0) 


QQQctaot. . . 




27960.3 ( 


3 .0) 


$>3>$>$>ota . . 




24119 . 9 ( 


1 . 8) 


<£<£<£Qofaf . . . 




86442 . 5 ( 


2 .7) 






115915 . 5 ( 


4 .0) 






68853 . 5 ( 


5.9) 


QQQQofQf. . 




15261 . 1 ( 


8 . 7) 


<|><J>3><|><f)Q; b m . 




7968 . 1 ( 


3.5) 


<$<$<$<$Qa . . 




35537 .2 ( 


5.3) 






63117 . 5 ( 


7.8) 


<£<£QQQof . . 




55684 .4 ( 


11 .5) 


<$>QQQQoi . . . 




24325 . 9 ( 


16 .7) 


y(™X jp-*. x 




4190 . 6 ( 


24 . 0) 


<£><£><£3><|><£> . . . 




1087 . 1 ( 


7 .0) 


<£<t>3><i><l>Q . . 




5767 . 0 ( 


10 . 3) 


4>3>3>3>QQ. . . 




12637 .2 ( 


15 .0) 






14581 . 7 ( 


21.6) 


$><$QQQQ. . . 




9290. 2( 30.6) 






3073 . 9 ( 


42 .2) 


QQQQQQ. . . 




408. 4( 56 


.0) 


Library 


size 


3.0000E+06 










total = 


2 . 


7885E+06 


% sampled = 4 . 


36 






aOfOfOTOfOf . 




10076 .4 ( 


.3) 


<J>aOfQfQfCkf . . . 




50296 . 9 ( 


.7) 






45190 . 9 ( 


1 .0) 


QQotototot . . . 




104432 .2 ( 


1.3) 






187345.5 ( 


2 .0) 


QQaaaa. . . 




83880 . 9 ( 


3.0) 






115256 . 6 ( 


2 .7) 


&$>Qoiaot. . . 




309107 . 9 ( 


4 .0) 


3>QQo:a:a! . 




275413 . 9 ( 


5.9) 


QQQofQfQf. . . 




81392 .5 ( 


8.7) 






71074 . 5 ( 


5.3) 


<i>3>3>Qafa: . . . 




252470 . 2 ( 


7.8) 






334106 .2 ( 


11 .5) 


&QQQotot . . . 




194606 . 9 ( 


16 .7) 






41905 . 9 ( 


24 .0) 






23067 . 8 ( 


10.3) 






101097 . 3 ( 


15 .0) 






174981 . 0 ( 


21.6) 






148643 . 7 ( 


30.6) 






61478 . 9 ( 


42 .2) 






9801 . 0 ( 


56 .0) 


<!>(|>3>(!>§><l> . , ( 




3039. 6 ( 


19.5) 


. 




15587 . 7 ( 


27.7) 


<$><£<i>3>QQ. . . 




32516 . 8 ( 


38.5) 






34975 . 6 ( 


51.8) 






20215 . 5 ( 


66 .6) 






5879 . 9 ( 


80 .7) 






667 . 0 ( 


91 .5) 



450 



Table 130: Sampling of a Library encoded by (NNK) 6 

(continued) 



Library 


size = 1 


. 0000E+07 












total 


8.1204E+06 % sampled = 


12.69 








OiOtdOLOLOL . . 


*3 *3 A C C Q / 

j j 4 d b . y v 


± . ±) 


<"fS r\i r\i r\i r\i r\t 


166342 


-4( 


2 


-2) 




1 /J O O *~7 1 *1 f 

14 o o / JL . ± ^ 


J * J ) 


s\t s\t /~\t f\i 


342685 


.7( 


4 


.4) 


QQoioioloi. . 


. 609987. 6( 


6.5) 


QQototaa . 


269958 


.3( 


9 


-6) 




, 372371. 8( 


8.6) 




983416 


.4 ( 


12 


-6) 


<$QQototot . . 


O 3 O *± / J_ . O V 


1 O A} 
_L O * *± J 


OOOrv rvrv 


244761 


.5( 


26 


.2) 


$<$$&aot . . 


. 222702. 0( 


16.5) 




767692 


-5( 


23 


.7) 




. 972324. 6( 


33 .3) 




531651 


.3 ( 


4 b 


. 6 J 


QQQQaof. . 


104722. 3( 


59 . 9) 




68111. 


0 ( 


30 . 


3) 


<£4>3>3>Qaf . . 


. 281976. 3( 


41.8) 




450120 


-2( 


55 


.6) 




. 342072. 1( 


70 .4) 


3>QQQQaf . 


122302 


-6( 


83 


.9) 




. 16364. 0( 


93 .5) 


3>3>3>(|><|>cE> . 


8028 . 


0 ( 


51 . 


4) 


3>4>3><i><l>Q . 


. 37179. 9( 


66 .1) 


<£<|><£<I>QQ . 


67719 . 


5( 


80 . 


3) 




. 61580. 0( 


91 . 2) 


3><£QQQQ . 


29586 . 


1 ( 


97 . 


4) 


<£QQQQQ . 


7259. 5( 


99.6) 


QQQQQQ . 


728.8 (100 


.0) 




Library 


size = 3 


. 00O0E+07 













10 

total = 1.8633E+07 % sampled = 29.11 



Q!QfQfQfQfQf . . . 


99247. 4( 3.3) 


Qoiotototot . . . 


487990. 0( 6.5) 


Qaaoiaoi . . . 


431933. 3( 9.6) 


&&0l0lCi0t . . . 


983416. 5( 12.6) 


<&QotoiOtot . . . 


1712943. 0( 18.4) 


QQQfQfQfQ! . . . 


734284. 6( 26.2) 


$&&0LOLOL . . . 


1023590. 0( 23.7) 


&$Qaotot . . . 


2592866. 0( 33.3) 


3>QQaa!Qr . . . 


2126605. 0( 45.6) 


QQQorao; . . . 


558519. 0( 59.9) 


<£<£3><£o!Qf . . . 


563952 . 6 ( 41.8) 


$$&Qctot . . . 


1800481. 0( 55.6) 


3><£QQaa! . . . 


2052433. 0( 70.4) 


^>QQQaa . . . 


978420. 5( 83.9) 


QQQQofQ! . . . 


163640. 3( 93.5) 


<$>§>§><$?&oi . . . 


148719. 7( 66.1) 


§>§>&<$>Qot . . . 


541755. 7( 80.3) 


$><i><i>QQaf . . . 


738960. 1( 91.2) 


3>3>QQQaf. . . 


473377. 0( 97.4) 




145189. 7( 99.6) 


QQQQQo; . . . 


17491 . 3 (100 .0) 




13829. 1( 88.5) 


<£><i>3><£3>Q . . . 


54058. 1( 96.1) 




83726. 0( 99.2) 


3><£3>QQQ. . . 


67454. 5( 99.9) 




30374 . 5 (100 .0) 




7290 . 0 (100 .0) 




729 . 0 (100 .0) 
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Table 130: Sampling of a Library encoded by (NNK) 6 
(continued) 

Library size = 7.6000E+07 

5 

total = 3.2125E+07 % sampled = 50.19 



aotaotaot. . 


245057. 8( 8.2) 




1175010. 0( 15.7) 


QofQfQfQfo;. . 


1014733. 0( 22.7) 


&$ctaaa . . . 


2255280. 0( 29.0) 


fT\ f") f\i /-\* f\i /V/ 




O O /v /*v /v /"v 


CS ( 7^ 
1 JW11ZO . U \ 3 _D - / y 


<&$>$>(X0t0t . - 


2142478. 0( 49.6) 


&&Qotota . - . 


4993247. 0( 64.2) 






OOO/vatv 


840691 9 ( 90 1 ) 




1007002. 0( 74.6) 




2825063. 0( 87.2) 


f") s\i 


97897^8 fW Qe; 4^ 
z / oz jjo . u \ z? zj . *± j 


(f)OOOn/n/ 






174790. 0( 99.9) 




210475 . 6 ( 93 . 5) 


<$&$$Qa . . 


663929.3 ( 98.4) 




808298. 6( 99.8) 


3>3>QQQcif . . 


485953.2(100.0) 




145799 . 9 (100 .0) 




17496.0(100.0) 


<£<!>3>3>3><l> . . . 


15559. 9 ( 99.6) 


<i>3><£(i>3>Q . . 


56234.9 (100.0) 




84374 . 6 (100 .0) 




67500.0(100.0) 




30375.0 (100.0) 


3>QQQQQ. . 


7290.0(100.0) 


QQQQQQ. . . 


729 . 0 (100 .0) 


Library 


size = 1.0000E+08 






total - 


3 . 653 7E+07 % sampled 


57.09 




acxaaaa . . 


318185. 1( 10.7) 


^QfOfQfQfQf . . . 


1506161. 0( 20.2) 




1284677. 0( 28.7) 


$>§>ototaoi . . . 


2821285. 0( 36.3) 


*Qaaaa . . 


4585163. 0( 49.1) 


QQaaaa . . . 


1783932. 0( 63.7) 




2566085. 0( 59.4) 


Q&QoLota. . . . 


5764391. 0( 74.1) 


$>QQotaa . . 


4051713. 0( 86.8) 


QQQotaot. . . 


888584. 3( 95.2) 


<i><i><i><i>aaf . . 


1127473. 0( 83.5) 


. . . 


3023170. 0( 93.3) 


^QQofQf. . 


2865517. 0( 98.3) 




1163743. 0( 99.8) 


QQQfiQfQ!. . 


174941.0(100.0) 


<£<l><l><i><i>C* . . . 


218886. 6( 97.3) 


c£>3><M>Qa. , 


671976. 9( 99.6) 


3><i><i>QQar . . . 


809757.3 (100.0) 


<$<$QQQot . . 


485997.5(100.0) 




145800 . 0 (100 . 0) 


QQQQQa. . 


17496.0(100.0) 


. . . 


15613. 5( 99.9) 


<£<£><!><M>Q. . 


56248.9(100.0) 


3>3><i><i>QQ. . . 


84375.0 (100.0) 


<f>3*i>QQQ. . 


67500.0(100.0) 




30375.0 (100.0) 




7290.0 (100.0) 


QQQQQQ. . . 


729.0 (100.0) 
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Table 130: Sampling of a Library encoded by (NNK) 6 
(continued) 

Library size = 3.0000E+08 

total = 5.2634E+07 % sampled = 82.24 



aaofCkfCkfOf . . 
<&Qotaotot. . 
&QQaaoi. . 

QQQQofo;. . 

<i>3>QQQa:. . 

<£3>3><£><i>Q . . 
$<$>$>QQQ. . 
<i>QQQQQ. . 



856451.3 ( 28.7) 
2854291. 0( 63.7) 



8103426 
4030893 
4654972 
1343954 
2915985 



0( 
0( 
0( 
0( 



86.8) 
93 .3) 
99.8) 
99 . 6) 



0 (100 . 0) 



174960 . 0 (100 . 0) 
674999.9 (100.0) 
486000 . 0 (100 .0) 
17496 . 0 (100 .0) 
56250 . 0 (100 .0) 
67500 . 0 (100 .0) 
7290 . 0 (100 .0) 



<i><i><i>3>3><i> . 
3>3>QQQQ . 



3668130. 0( 49.1) 
5764391. 0( 74.1) 
2665753. 0( 95.2) 
7641378. 0( 98.3) 
933018 . 6 (100 .0) 
3239029.0 (100.0) 
1166400.0 (100.0) 
224995.5 (100.0) 
810000 . 0 (100.0) 
145800.0 (100.0) 
15625 . 0 (100 .0) 
84375. 0 (100 . 0) 
30375 . 0 (100 .0) 
729.0(100.0) 



Library size = 



1. 0000E+09 



total = 6.1999E+07 % sampled 



96.87 



ackfaofOfQf . . 


. 2018278 


. 0 ( 67 . 


6) 




6680917 


. 0 ( 89 . 


5) 


Qoiaaaot. . 


. 4326519 


. 0 ( 96 . 


6) 




. 7690221 


. 0 ( 98 . 


9) 




. 9320389 


.0 ( 99. 


9) 




. 2799250 


. 0 (100 . 


0) 




. 4319475 


. 0 (100 . 


0) 


<$$>Qaotot . . 


. 7775990 


. 0 (100 . 


0) 




. 4665600 


. 0 (100 . 


0) 


QQQaota . . 


933120. 


0 (100 .0) 




. 1350000 


. 0 (100 . 


0) 


&&&Qaot • • 


3240000 


. 0 (100 . 


0) 


&&QQaot . . 


. 2916000 


. 0 (100 . 


0) 




1166400 


. 0 (100 . 


0) 


QQQQo!Qf. . 


. 174960. 


0 (100 . 0) 




. 225000. 


0 (100 .0) 


3>3><!><£>QQf. . 


. 675000. 


0 (100 .0) 




810000. 


0 (100 .0) 




. 486000. 


0 (100 .0) 




. 145800. 


0 (100 .0) 


QQQQQa:. . 


. 17496.0 


(100 . 0) 




. . 


. 15625.0 


(100 . 0) 




<£3><£><l><i>Q . . 


. 56250.0 


(100 . 0) 






84375.0 


(100 . 0) 




3><i>3>QQQ. . 


. 67500.0 


(100 . 0) 




3>3>QQQQ. . 


. 30375.0 


(100 . 0) 






. 7290.0(100.0) 






729.0(100.0) 
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Table 130: Sampling of a Library encoded by (NNK) 6 
(continued) 



Library size = 3.0000E+09 

total = 6.3890E+07 % sampled = 99.83 





. 2884346 


.0 ( 96. 


6) 


^QfQfQfao: . . . 


7456311 


.0 ( 99.9) 




. 4478800 


. 0 (100 . 


0) 


§>$otototot . . . 


7775990 


. 0 (100 . 0) 


§>Qot<xotQt . . 


. 9331200 


. 0 (100 . 


0) 


QQcxaaa . . . 


2799360 


.0 (100.0) 


$<$$>aaot . . 


. 4320000 


. 0 (100 . 


0) 


$$Qototot . . . 


7776000 


.0 (100.0) 


$>QQototot. . 


. 4665600 


. 0 (100 . 


0) 


Q^QaofOf. . . 


933120 . 


0 (100.0) 


3>3>3><i>a:af . . 


. 1350000 


. 0 (100 . 


0) 


<£<i><i>QaQr . . . 


3240000 


. 0 (100 . 0) 


<£<£QQa?Qf . . 


. 2916000 


. 0 (100 . 


0) 


<£QQQarar. . . 


1166400 


. 0 (100 . 0) 


QQQQota . . 


. 174960. 


0 (100 .0) 


ot . . . 


225000 . 


0 (100.0) 




. 675000. 


0 (100 .0) 


<i><£<£>QQar. . . 


810000 . 


0 (100 .0) 




. 486000. 


0 (100 .0) 


3>QQQQaf. . . 


145800 . 


0 (100 .0) 


QQQQfiof . . 


17496.0 


(100 . 0) 




. . . 


15625 . 0 


(100 .0) 


3>3>3>4>3>Q . . 


. 56250.0 


(100 . 0) 




3>3>3><i>QQ. . . 


84375 . 0 


(100.0) 


$$$QQQ. . 


. 67500.0 


(100 . 0) 




<£<I>QQQQ. . . 


30375 . 0 


(100 .0) 




. 7290.0(100.0) 




QfiQQQQ . . . 


729 . 0 (100 .0) 
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Table 130, continued 
Formulae for tabulated quantities. 



Lsize is the number of independent transf ormants 
31**6 is 31 to sixth power; 6*3 means 6 times 3. 
A = Lsize/ (31**6) 

Of can be one of [W^M^F^Y^C, I^K^D^^N^H^Q . ] 
3> can be one of [P?Tj. aTvTgT 

■ sate 



10 Q can be one of 
F0 = (12)**6 
F3 = (12)**3 
F6 = 1 



Fl = (12)**5 F2 = (12)**4 

F4 = (12)**2 F5 = (12) 



15 



20 



25 



30 



35 



40 



45 



50 



aofofoiofo; 

Qaaototot 

&Qotototot 
QQotaotot 

<$&Q0£0£0£ 

$$>QQotot 
<$QQQaot 

3><i>QQQQf 

<$><$<$>QQQ 
total 



= 6 



F0 * (1-exp (- 
6 * 5 * Fl * 
* 3 * Fl * 
(15) * 5**2 * 
(6*5)*5*3 *F2 
(15) * 3**2 * 
(20)*(5**3) * 



F2 
F3 



= (15) 



) 



A) ) 

(1-exp (-2*A) ) 
(1-exp (-3*A) ) 
F2 * (1-exp (-4 *A) ) 
(1-exp (-6*A) ) 
* (1-exp (-9*A) ) 

, * (l-exp(-8*A) ) 

= (60) * (5*5*3) *F3* (1-exp (-12*A) ) 
= (60) * (5*3*3) *F3* (1-exp (-18*A) ) 
= (20)*(3)**3*F3* (1-exp (-27*A) ) 
( 5 ) **4* F4 * (i-exp (-16*A) ) 
(60)*(5)**3*3*F4* (1-exp (-24 *A) ) 
(90) * (5*5*3*3) *F4* ( 1 -exp ( - 3 6*A) ) 
(60)* (5*3*3*3) *F4* ( 1 -exp ( - 54 *A) 
(15)*(3)**4 * F4 * (l-exp(-81*A) ) 
(6)*(5)**5 * F5 * (1-exp (-32*A) ) 
30*5*5*5*5*3*F5* ( 1 -exp ( -48*A) ) 
60*5*5*5*3*3*F5* ( 1 -exp ( -72 *A) ) 
= 60*5*5*3*3*3*F5* ( 1 -exp ( - 108*A) ) 
30*5*3*3*3*3*F5* ( 1-exp ( -162*A) ) 
= 6*3*3*3*3*3*F5* (1-exp (-243*A) ) 
= 5**6 * (1-exp (-64*A) ) 
= 6*3*5**5* (1-exp (-96*A) ) 
= 15*3*3*5**4* (1-exp (-144*A) ) 
= 20*3**3*5**3* (1-exp (-216*A) ) 
= 15*3**4*5**2* (1-exp (-324*A) ) 
= 6*3**5*5* (1-exp (-486*A) ) 
= 3**6* (1-exp (-729*A) ) 




Qotototota 

3>3><i>3><l>Q 
QQQQQQ 



arc* a? 
<$>$QQQa 



&Qotaotot 



+ 
+ 

+ 
+ 
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Table 131: Sampling of a Library 
Encoded by (NNT) 4 (NNG) 2 
(continued) 

The following sections show how many sequences of each 
class are expected for libraries of different sizes. 



Library size = 

total 

Type 

XX00XX 

xxQQxx 

xxGQxS 

xxeess 

xxQfiSS 

xseQSS 

sseess 

ssqqss 

Library size 

total 

xxGGxx 

xxQQxx 

xxGQxS 

xxeess 

xxQQSS 

xsgqss 

sseess ..... 

SSQQSS 

Library size 

total 

xx G 6 xx 

xxQQxx 

xxGQxS 

xxeess 

xxQQSS 

xSGQSS 

sseess 

SSQQSS 



1 . 0000E+05 



9 . 9137E+04 
Number % 



fraction sampled = 1.1587E-02 

Type Number % 

xxGQxx 22771. 4( 1.3) 

xxeexS 17891. 8 ( 1.3) 

xxQQxS 2318. 5 ( 5.3) 

xxGQSS 2732. 5 ( 5.3) 

xSeeSS 357. 8( 5.3) 

xSQQSS 43.7 ( 19.5) 

SSeQSS 8.6( 19.5) 



10 



15 



31416 . 9 ( 



7) 



4112. 4( 2.7) 



12924 
3808 
483 
253 
12 .4 ( 



6( 
K 
7( 
4( 
10 



2 
2 
10 
10 
3) 



7) 
7) 
3) 
3) 



1.4 ( 35.2) 



1 . OO00E+06 



9 .2064E + 05 
304783. 9( 6.6) 
36508. 6( 23.8) 
114741. 4( 23.8) 
33807. 7( 23.8) 
3114. 6( 66.2) 
1631. 5( 66.2) 
80. 1 ( 66.2) 
3 .9 ( 98.7) 



3.0000E+06 

2 .3880E+06 
855709. 5( 18.4) 
85564. 7( 55.7) 
268917. 8( 55.7) 
79234. 7( 55.7) 
4522. 6( 96.1) 
2369. 0( 96.1) 
116. 3( 96.1) 
4 .0 (100.0) 



fraction sampled = 1.0761E-01 

xxGQxx 214394. 0 ( 12.7) 

xxeexS 168452. 5 ( 12.7) 

xxQQxS 18383.8 ( 41.9) 

xxGQSS 21666. 6( 41.9) 

xSeeSS 2837.3 ( 41.9) 

xSQQSS 198. 4 ( 88.6) 

SSeQSS 39. 0 ( 88.6) 



fraction sampled = 2.7912E-01 

xxeQxx 565051. 6 ( 33.4) 

xxGGxS 443969. 1 ( 33.4) 

xxQQxS 35281.3 ( 80.4) 

xxeQSS 41581. 5( 80.4) 

xSeeSS 5445. 2( 80.4) 

xSQQSS 223.7 ( 99.9) 

SSeQSS 43. 9 ( 99.9) 
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Table 131: Sampling of a Library 
Encoded by (NNT) 4 (NNG) 2 



(continued) 



Library size = 



8 . 5556E+06 



10 



15 



total 4.9303E+06 

xx96xx 2046301. 0 ( 44.0) 

xxQQxx 138575. 9( 90.2) 

xxGQxS 435524.3 ( 90.2) 

xxGOSS 128324. 1 ( 90.2) 

xxQQSS 4703 .6 (100 .0) 

xSOQSS 24 63 . 8 (100 .0) 

sseess 121.0 (100.0) 

SSQQSS 4.0 (100.0) 

Library size = 1.0000E+07 

total 5.3667E+06 

xxOOxx 2289093. 0( 49.2) 

xxQQxx 143467.0 ( 93.4) 

xxGQxS 450896. 3 ( 93.4) 

xxGGSS 132853. 4 ( 93.4) 

xxQQSS 4703.9(100.0) 

xSGQSS 2464 . 0 (100 . 0) 

sseess 121 . 0 (100 .0) 

SSQQSS 4.0 (100 .0) 

Library size = 3.0000E+07 

total 7.8961E+06 

xxeexx 4040589. 0 ( 86.9) 

xxQQxx 153619 . 1 (100 . 0) 

xxOQxS 4 82 8 02. 9(100.0) 

xxGGSS 1422 54.4(100.0) 

xxQQSS 4704.0(100.0) 

xSGQSS 24 64.0(100.0) 

sseess 121.0 (100. 0) 

SSQQSS 4.0 (100 . 0) 



fraction sampled = 5.7626E-01 

xxGQxx 1160645. 0 ( 68.7) 

xxGGxS 911935. 6 ( 68.7) 

xxQQxS 43480.7 ( 99.0) 

xxGQSS 51245. 1 ( 99.0) 

xseess 6710. 7( 99.0) 

xSQQSS 224 . 0 (100 .0) 

SSGQSS 44 . 0 (100 .0) 



fraction sampled = 6.2727E-01 

xxGQxx 1254877. 0 ( 74.2) 

xxGGxS 985974. 9( 74.2) 

xxQQxS 43710.7 ( 99.6) 

xxGQSS 51516. 1( 99.6) 

xSOeSS 6746. 2( 99.6) 

xSQQSS 224 .0 (100 .0) 

SSG^SS 44 . 0 (100 .0) 



fraction sampled = 9.2291E-01 

xxGQxx 1661409.0 ( 98.3) 

xxeOxS 1305393. 0 ( 98.3) 

xxQQxS 4 3 904 . 0 (100 . 0) 

xxGQSS 51744 . 0 (100 . 0) 

xSOeSS 6776.0 (100 . 0) 

xSQQSS 224 . 0 (100 .0) 

SSGQSS 44.0 (100.0) 
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Table 131: Sampling of a Library 
Encoded by (NNT) 4 (NNG) 2 
(continued) 



Library size = 



5 . 0000E+07 



total 8.3956E+06 

XX00XX 4491779. 0( 96.6) 

xxQQxx 153663 . 8 (100 .0) 

xxGQxS 4 82 943 .4 (10 0 . 0) 

XX09SS 1422 95 .8 (100.0) 

xxQQSS 47 04. 0(100.0) 

xSOQSS 24 64. 0(100.0) 

sseess 121.0 (100.0) 

SSQQSS 4.0 (100 .0) 



fraction sampled = 9.8130E-01 
1688387 . 0 ( 99 . 9) 
1326590.0 ( 99.9) 
43904 . 0 (100 .0) 
51744 . 0 (100 .0) 
6776. 0 (100 .0) 
224 . 0 (100 .0) 
44 . 0 (100 .0) 



xx9Qxx . 

xxeexs . 

xxQQxS . 
xxOQSS . 

xseess. 

xSQQSS . 
SS9QSS. 



Library size = 



1 . 0000E+08 



10 total 



8 . 5503E+06 

xxGOxx 4643063. 0( 99.9) 

xxQQxx 153 664 .0 (100.0) 

xxOQxS 4 82 944 . 0 (100 . 0) 

XX6 9SS 1422 96 . 0 (100 . 0) 

xxQQSS 47 04.0(100.0) 

xSGQSS 2464 . 0 (100 .0) 

sseess 121.0 (100. 0) 

. . 4.0 (100 . 0) 



fraction sampled = 9.9938E-01 



SSQQSS , 



xx0Qxx . 
xx99xS . 
xxQQxS . 
xxGQSS . 

xseess. 

xSQQSS . 
SS6QSS . 



1690302 . 0 (100 .0) 
1328094 . 0 (100 .0) 
43904 . 0 (100 .0) 
51744 . 0 (100 .0) 
6776.0(100.0) 
224 . 0 (100 .0) 
44 . 0 (100 .0) 
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Table 132 : Relative efficiencies of 
various simple variegation codons 



vgCodon 



Number of codons 
5 6 

#DNA/#AA 
[#DNA] 
(#AA) 



#DNA/#AA 
[#DNA] 
(#AA) 



#DNA/#AA 
[#DNA] 
(#AA) 



NNK 

assuming 
stops vanish 



8 . 95 
[2 .86- 10 7 ] 
(3.2- 10 6 ) 



13 .86 
[8 . 87 - 10 8 ] 
(6.4- 10 7 ) 



21.49 
[2 . 75 - 10 10 ] 
(1.28- 10 9 ) 



NNT 1.38 1.47 1.57 

[1.05-106] [1.68-10 7 ] [2.68-10 8 ] 

(7.59-10 5 ) (1.14-10 7 ) (1.71-10 8 ) 

NNG 2.04 2.36 2.72 

assuming [7.59-10 5 ] [1.14 -10 6 ] [1.71-10 8 ] 

stops vanish (3.7-10 5 ) (4.83-10 6 ) (6.27-10 7 ) 
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Table 14 0. Affect of anti BPTI IgQ on phage titer. 



Phage Strain Input +Anti- +Anti-BPTI Eluted 

BPTI +Protein A (a) Phage 

M13MP18 100 (b) 98 92 7-10" 4 
BPTI. 3 100 26 21 6 
M13MB48 (c) 100 90 36 0.8 
M13MB48 (d) 100 60 40 2 . 6 



(a) Protein A-agarose beads. 

(b) Percentage of input phage measured as plaque 
forming 

units 

(c) Batch number 3 

(d) Batch number 4 



Table 141 



Strain 



Affect of anti -BPTI or protein A on phage titer. 

+ Ant i - 



Input 



No +Anti- 
Addition BPTI 



+Protein A BPTI 
(a) +Protein A 



M13MP18 100(b) 107 

M13MB48(b) 100 92 



105 
7 . 10" 



72 
58 



65 
<10" 



(a) Protein A-agarose beads 

(b) Percentage of input phage measured as plaque 
forming 

units 

(c) Batch number 5 
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Table 142 Affect of anti-BPTI and non- immune serum on phage 
titer 



+Anti- +NRS +Anti- +NRS 

Strain Input BPTI (a) BPTI +Protein 

+Protein A A 

(b) 

M13MP18 100(c) 65 104 71 88 
M13MB48(d) 100 30 125 13 121 
M13MB4 8 (e) 100 2 105 CK7 110 



(a) Purified IgG from normal rabbit serum. 

(b) Protein A-agarose beads. 

(c) Percentage of input phage measured as plaque 
forming units 

(d) Batch number 4 

(e) Batch number 5 
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Table 143. Loss in titer of display phage with anhydrotrypsiri . 



Strain 


Anhy dr o t ryp sin 
Beads 


Streptavidin 
Beads 




Start 


Post 

Incubation 


Start 


Incubation 


M13MP18 


100 (a) 


121 


ND 


ND 


M13MB48 


100 


58 


100 


98 


5AA Pool 


100 


44 


100 


93 



5 (a) Plaque forming units expressed as a percentage of input. 

Table 144. Binding of Display Phage to Anhydrotrypsin . 

Experiment 1 . 

10 

Strain Eluted Phage (a) 

M13MP18 0.2 (a) 

BPTI-IIMK 7.9 

M13MB48 11.2 

Experiment 2 . 



Strain Eluted Phage (a) Relative to 

M13mpl8 

M13mpl8 0.3 1.0 

BPTI-IIIMK 12.0 40.0 

M13MB56 17.0 56.7 

15 



Relative to 
M13MP18 

1 . 0 

39.5 

56.0 



(a) Plaque forming units acid eluted from beads, expressed as 
a percentage of the input . 
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Table 145. Binding of Display Phage to Anhydrotrypsin or 
Trypsin . 



Strain 


Anhy dr o t ryp sin 


Beads 


Trypsin Bead 


Is 




Eluted 

lit -L. \JL I— - V — vJ. 

Phage 
(a) 


Relative 

Ri nH"i nn f ^ 


Eluted 
Phage 


Relative 

"R n nH "i nn 


M13MP18 


0.1 


1 


2 .3xl0~ 4 


1 . 0 


BPTI-IIIMK 


9.1 


91 


1 . 17 


5x103 


M13 .3X7 


25.0 


250 


1.4 


6xl0 3 


M13 .3X11 


9.2 


92 


0 .27 


1 .2xl0 3 


(a) Plaque 
percentage 


forming units 
of the input . 


eluted from 


beads, expressed as a 



(b) Relative to the non-display phage, M13MP18. 

Table 146. Binding of Display Phage to Trypsin or Human 
10 Neutrophil Elastase. 



Strain 


Trypsin Beads 


HNE Beads 




Eluted 




Eluted 






Phage 


Relative 


Phage 


Relative 




(a) 


Binding (b) 




Binding 



M13MP18 


5xl0" 4 


1 


| 3xl0' 4 


1.0 


BPTI-IIIMK 


1.0 


2000 


| 5xl0" 3 


16 . 7 


M13MB48 


0 . 13 


260 


| 9xl0" 3 


30 . 0 


M13 .3X7 


1 . 15 


2300 


| IxlO" 3 


3 . 3 


M13 . 3X11 


0 . 8 


1600 


| 2xl0" 3 


6 . 7 


BPTI3 . CL 
(c) 


IxlO' 3 


2 


1 4 • 1 


1 .4xl0 4 



(a) Plaque forming units acid eluted from the beads, expressed 
as a percentage of input . 
15 (b) Relative to the non-display phage, M13MP18. 

(c) BPTI-IIIMK (K15L MGNG) (MGNG has SEQ ID NO:12). 
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Table 155 

Distance in A between alpha carbons in octapeptides : 



5 Extended Strand: angle of Cq,1 -C a 2 -0^3 = 13 8 ' 





1 


2 




3 


4 


5 


6 






7 




8 


1 


























2 


3.8 
























3 


7 . 1 


3.8' 






















4 


10 . 7 


7.1 




3 . 8 


















5 


14 .2 


10.7 




7 . 1 


3 . 8 


- 














6 


17 . 7 


14 .1 




10 . 7 


7 . 1 


3 . 8 


- 












7 


21.2 


17.7 




14 . 1 


10 . 6 


7.0 


3 


. 8 




- 






8 


24 . 6 


20 . 9 




17 .5 


13 . 9 


10 . 6 


7 


. 0 




3 


. 8 


- 


Reverse turn 


between 


residues 4 


and 5 . 
















1 


2 




3 


4 


5 


6 






7 




8 


1 


























2 


3 . 8 
























3 


7 . 1 


3 . 8 






















4 


10 . 6 


7.0 




3.8 


















5 


11 . 6 


8 . 0 




6 . 1 


3 . 8 


- 














6 


9.0 


5.8 




5.5 


5.6 


3.8 


- 












7 


6.2 


4 . 1 




6 . 3 


8 . 0 


7 . 0 


3 


. 8 




- 






8 


5.8 


6.0 




9.1 


11 . 6 


10 . 7 


7 


.2 




3 


. 8 


- 


Alpha 


helix : 


angle 


of Cofl- 


C^2 ~ C(y3 


= 93° 
















1 


2 




3 


4 


5 


6 






7 




8 


1 


























2 


3 . 8 
























3 


5 . 5 


3 . 


8 




















4 


5 . 1 


5 . 


4 


3 . 8 


















5 


6.6 


5 . 


3 


5 . 5 


3.8 
















6 


9.3 


7 . 


0 


5 . 6 


5.5 


3 . 8 














7 


10 .4 


9 . 


3 


6 . 9 


5.4 


5.5 




3 . 


8 








8 


11 .3 


10 . 


7 


9.5 


6.8 


5.6 




5 . 


6 




3 . 8 





10 



15 
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Table 156 



Distances between alpha carbons in closed mini -proteins of the 
5 form disulfide cyclo (CXXXXC) 



10 



Minimum distance 



1 

2 3.8 

3 5.9 3.8 

4 5.6 6.0 3.8 

5 4.7 5.9 6.0 3.8 

6 4.8 5.3 5.1 5.2 3.8 



15 



Average distance 



1 

2 3.8 

3 6.3 3.8 

4 7.5 6.4 3.8 

5 7.1 7.5 6.3 3.8 

6 5.6 7.5 7.7 6.4 3.8 



Maximum distance 



1 

2 3.8 

3 6.7 3.8 

4 9.0 6.9 3.8 

5 8.7 8.8 6.8 3.8 

6 6.6 9.2 9.1 6.8 3.8 
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Table 160: pH Profile of BPTI-III MK phage and EpiNEl phage 
binding to Cat G beads . 



5 BPTI-IIIMK (BPTI has SEQ ID N0:44) 



pH 




Total pfu in Fraction 


Percentage of Input 


7 




3 . 7xl0 5 


3 . 7xl0~ 2 


6 




3 . IxlO 5 


3 . IxlO" 2 


5 




1.4xl0 5 


1 .4xl0" 2 


4.5 




3 . IxlO 4 


3 . IxlO" 3 


4 




7 . IxlO 3 


7 . IxlO" 4 


3 . 5 




2 . 6xl0 3 


2 .6xl0" 4 


3 




2 . 5x103 


2 . 5xl0" 4 


2 . 5 




8 . 8xl0 2 


8 . 8xl0~ 5 


2 




.6xl0 2 


7 . 6xl0" 5 


(total 


input = 


1x10 9 phage) 




EpiNEl 


(EpiNEl 


has SEQ ID NO: 51) 




7 




2 . 5xl0 5 


1 . IxlO" 2 


6 




6 . 3xl0 4 


2 . 7xl0" 3 


5 




7 . 4xl0 4 


3 . IxlO" 3 


4 . 5 




7 . IxlO 4 


3 . OxlO" 3 


4 




4 . IxlO 4 


1 . 7xl0" 3 


3 . 5 




3 . 3xl0 4 


1 .4xl0~ 3 


3 




2 . 5xl0 3 


1 . IxlO" 4 


2.5 




1 .4xl0 4 


5 . 7xl0" 4 


2 




5 . 2xl0 3 


2 .2xl0" 4 


(total 


input = 


2 . 35xl0 8 phage) . 
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TABLE 201 

Elution of Bound Fusion Phage from Immobilized 
Active Trypsin 



Type of 
Phage 



Buffer Total Plaque - 
Forming Units 
Recovered in 
Elution Buffer 



Percent of 
Input Phage 
Recovered 



Ratio 



BPTI-III 


MK 


CBS 


8 


.80 


•10 7 


4 


.7- 


10" 1 


1675 


MK 




CBS 


1 


.35 


•10 6 


2 


. 8 • 


10" 4 




BPTI-III 


MK 


TBS 


1 


.32 


• 10 8 


7 


.2- 


10' 1 


2103 


MK 




TBS 


1 


.48 


• 10 6 


3 


.4 - 


lO" 4 





The total input for BPTI-III MK phage was 1.85 -10 10 plaque 
forming units while the input for MK phage was 4.65-10 
plaque -forming units . 
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TABLE 2 02 

Elution of BPTI-III MK and BPTI (K15L) -III MA Phage from 
Immobilized Trypsin and HNE 



Tvrnp of Phaae 


= ssa =^ 

T mm oh i 1 i y&ci 


Total Placrue- 


Percentaae of 




Pt~0+" Pi c; <=> 
.t J— i_- v3 c*. o v — . 


Formi nn Units 


Incut Phaoe 

-X. X X V<i X. X X^X^H 






in Elution 


Recovered 






Fraction 




BPTI-III 


Trypsin 


2.1- 10 7 


4.1- 10" 1 


MK 








BPTI-III 


HNE 


2.6- 


5 • 10" 3 


MK 








BPTI (K15L) - 


Trypsin 


5.2-10 4 


5 - 10" 3 


III MA 








BPTI (K15L) - 


HNE 


1.0- 10 6 


1.0-10" 1 


III MA 









The total input of BPTI-III MK phage was 5.1-10 9 pfu and the 
input of BPTI (K15L) -III MA phage was 9.6- 10 8 pfu. 
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TABLE 2 03 

Effect of pH on the Disociation of 
Bound BPTI-III MK and 
5 BPTI (K15L) -III MA Phage from Immobilized HNE 





BPTI-III MK 




BPTI (K15L) 


-III MA 


pH 


Total Plaque 

PnriTi "i ncr TTn "its 

in Fraction 


% 

of Inout 
Phage 


Total Plaque- 
Forming Units 
in Fraction 


% 

of Input 
Phage 


7.0 


5 . 0 - 10 4 


2 • 10 3 


1.7- 10 5 


3 .2 -10~ 2 


6 . 0 


3.8- 10 4 


2 • 10" 3 


4.5- 10 5 


8.6-10" 2 


5.0 


3.5- 10 4 


1 - 10" 3 


2.1- 10 6 


4 .0 -10" 1 


4 . 0 


3.0- 10 4 


1 • 10~ 3 


4.3- 10 s 


8.2- 10" 1 


3 . 0 


1.4-10 4 


1 • 10" 3 


1.1- 10 6 


2.1- 10" 1 


2.2 


2 . 9 • 10 4 


1- 10" 3 


5.9- 10 4 


1.1- 10" 2 



Percentage of Percentage of 

Input Phage = 8.0-10" 3 Input Phage = 1.56 
10 Recovered Recovered 



The total input of BPTI-III MK phage was 
0.030 ml x (8.6-10 10 pfu/ml) = 2.6-10 9 . 

15 

The total input of BPTI (K15L) -III MA phage was 
0.030 ml x (1.7-10 10 pfu/ml) = 5.2-10 8 . 

Given that the infectivity of BPTI (K15L) -III MA phage is 5 
2 0 fold lower than that of BPTI-III MK phage, the phage inputs 
utilized above ensure that an equivalent number of phage 
particles are added to the immobilized HNE. 
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TABLE 2 04 

Effect of Mutation of Residues 39 to 42 of BPTI 
on the ability of BPTI (K15L) -III MA to Bind to 
Immobilized HNE 



pH 



BPTI (K15L) -III MA 



BPTI (K15L f MGNG) -III MA 
MGNOJaaa.. 



Total 
Plaque 
Forming 
Units 



% Input 



Total 
Plaque- 
Forming 
Units 



% Input 



7 


. 0 


3 


. 0 


-10 5 


8.2- 10" 2 


4.5- 10 5 


1 


. 63 • 10" 1 


6 


. 0 


3 


. 6 


-10 5 


1.00- 10" 1 


6.3- 10 5 


2 


.27 • 10" 1 


5 


. 5 


5 


.3 


■10 5 


1.46- 10" 1 


7.3- 10 5 


2 


. 64 - 10" 1 


5 


. 0 


5 


. 6 


-10 5 


1 . 52 • 10" 1 


8.7- 10 5 


3 


. 16 • 10" 1 


4 


.75 


9 


. 9 


-10 5 


2.76- 10" 1 


1.3- 10 6 


4 


. 60 - 10" 1 


4 


. 5 


3 


.1 


-10 s 


8 . 5 - 10" 2 


3.6- 10 5 


1 


. 30 • 10" 1 


4 


.25 


5 


.2 


-10 5 


1 .42 - 10" 1 


5.0-10 5 


1 


. 80 - 10' 1 


4 


. 0 


5 


. 1 


•10 4 


1.4- 10" 2 


1.3- 10 5 


4 


.8-10" 2 


, 3 


. 5 


1 


.3 


-10 4 


4 • 10" 3 


3 . 8 • 10 4 


1 


.4 -10" 2 



10 



Total 

Percentage 
Recovered 



= 1.00 



Total 

Percentage 
Recovered 



= 1.80 



The total input of BPTI (K15L) - III MA phage was 
0.030 ml x (1.2-10 10 pfu/ml) = 3.6-10 8 pfu. 



15 The total input of BPTI (K15L, MGNG) -III MA ( MGNG has SEQ ID 
NO : 12 ) phage was 

0.030 ml x (9.2-10 9 pfu/ml) = 2.8-10 8 pfu. 
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TABLE 2 05 

Fractionation of a Mixture of 
BPTI-III MK and 
5 BPTI (K15L,MGNG) -III MA Phage ( MGNG ^ hai ^ ^ ^I^ ^ ^ j^^) 
on Immobilized HNE 





BPTI-III Mk 








BPTI (K15L,MGNG) - III MA 
(MGNG has SEp ID NO: 12) 


PH 


Total 


% 






Total 


% 




Kanamycin 
Transducing 


of Input 






Ampicillin 
Transducing 


of Input 




Units 








Units 




7 . 0 


4.01- 10 3 


4.5- 10" 3 


1 


.39 


•10 5 


3 . 13 - 10" 1 


6 . 0 


7.06- 10 2 


8-10" 4 


7 


. 18 


•10 4 


1 . 62 ■ 10' 1 


5 . 0 


1.81- 10 3 


2.0- 10" 3 


1 


.35 


*10 5 


3 . 04 • 10" 1 


4.0 


1.49- 10 3 


1.7- 10" 3 


7 


.43 


•10 5 


1 .673 



10 The total input of BPTI-III MK phage was 

0.015 ml x (5.94 -10 9 kanamycin transducing units/ml) = 8.91-10 7 
kanamycin transducing units. 

The total input of BPTI (K15L, MGNG) -III MA (M^ ^ ^h^^ ^ S^^ID 
15 NQjJL2J phage was 

0.015 ml x (2.96-10 9 ampiciliin transducing units/ml) 
4.44-10 7 ampicillin transducing units. 
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TABLE 2 06 



5 




Characterization of 
BPTI (K15V,R17L) -III 


the Affinity 
MA Phage for 


Of 

Immobilized HNE 

(MGl^h^_.SEQ_^_jaQ -L2 ) 






BPTI (K15V, R17D-III MA 


BPTI (K15L, MGNG) 


-III MA 






Total Plaque- 
Fo rmi ng Un its 
Recovered 


Percentage 
of Input 
Phage 


Total Plaque- 
Forming Units 
Recovered 


Percentage 
of Input 
Phage 


7. 


0 


3.19-10 6 


8.1- 10" 2 


9 .42 • 10 4 


4.6- 10" 2 


6. 


0 


5 .42 • 10 6 


1.38-10" 1 


1.61- 10 5 


7.9- 10" 2 


5 . 


0 


9.45- 10 6 


2.41- 10" 1 


2.85- 10 5 


1.39- 10" 1 


4 . 


5 


1.39- 10 7 


3 .55 • 10" 1 


4 .32 • 10 5 


2.11- 10" 1 


4 . 


0 


2 . 02 • 10 7 


5.15- 10" 1 


1 .42 • 10 5 


6.9- 10" 2 


3 . 


75 


9.20- 10 6 


2.35- 10" 






3 . 


5 


4 . 16 • 10 6 


1 . 06 • 10" 1 


5.29- 10 4 


2.6- 10" 2 


3 . 


0 


2 . 65 • 10 6 


6.8- 10" 2 







Total Input = 1.73 Total Input = 0.57 

Recovered Recovered 



Total input of BPTI (K15V, R17L) - I I I MA phage was 
10 0.040 ml x (9.80-10 10 pfu/ml) = 3.92-10 9 pf u . 

Total input of BPTI (K15L, MGNG) -III MA (MGN^^^ S,, SEO I D ^ NO = 12 ) 
phage was 

0.040 ml x (5.13-10 9 pfu/ml) = 2.05-10 8 pfu. 
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TABLE 2 07 

Sequence of the EpiNEce Clone Selected 
From the Mini -Library 



1 1 


1 


1 


1 


1 


1 


2 


2 


3 4 


5 


6 


7 


8 


9 


0 


1 


P C 


V 


A 


M 


F 


Q 


R 




CCT . TGC . 


GTG. 


GCT. 


ATG 


TTC. 


CAA. 


CGC. 


TAT 


(SEQ ID NO: 


10 8 <4#f 














amino acid 


sequence : 


SEQ 


ID NO: 


244 
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TABLE 2 08 

SEQUENCES OF THE EpiNE CLONES IN THE PI REGION 

5 

CLONE 

IDENTIFIERS SEQUENCE 

EpiNE3 (amino-acid: SEQ ID NO: 245) 
10 111111122 

345678901 
3, 9, 16, PCVGFFSRY 
17, 18, 19 CCT . TGC . GTC . GGT . TTC . TTC . TCA . CGC . TAT 

(DNA: SEQ ID NO: 109) 

15 

EpiNE 6 (amino-acid: SEQ ID NO: 246) 

111111122 
345678901 
6 PCVGFFQRY 

2 0 CCT . TGC . GTC . GGT . TTC . TTC . CAA . CGC . TAT 

(DNA: SEQ ID NO: 110) 

EpiNE7 (amino-acid: SEQ ID NO: 247) 

111111122 
25 345678901 
7, 13, 14 PCVAMFPRY 
15 , 20 CCT . TGC . GTC . GCT . ATG . TTC . CCA . CGC . TAT 

(DNA: SEQ ID NO: 111) 

30 EpiNE4 (amino-acid: SEQ ID NO: 248) 

111111122 
345678901 
4 PCVAIFPRY 
CCT . TGC . GTC . GCT . ATC . TTC . CCA . CGC . TAT 

3 5 (DNA: SEQ ID NO: 112) 
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TABLE 2 08 

SEQUENCES OF THE EpiNE CLONES IN THE PI REGION 
(continued) 



CLONE 

IDENTIFIERS 



SEQUENCE 



EpiNE8 (amino- acid 



10 



15 



EpiNE 1 (amino-acid : 



20 



1, 
11, 



10 
12 



; SEQ ID NO:249) 

111111122 
345678901 
PCVAIFKRS 
CCT . TGC . GTC . GCT . ATC . TTC . AAA . CGC . TCT 
(DNA: SEQ ID NO: 113) 

: SEQ ID NO: 2 50) 

111111122 
345678901 
PCIAFFPRY 
CCT . TGC . ATC . GCT . TTC . TTC . CCA . CGC . TAT 
(DNA: SEQ ID NO: 114) 



EpiNE5 (amino-acid: SEQ ID NO: 251) 

111111122 
25 345678901 
5 PCIAFFQRY 
CCT . TGC . ATC . GCT . TTC . TTC . CAA . CGC . TAT 
(DNA: SEQ ID NO: 115) 



3 0 EpiNE2 (amino-acid: 



35 



SEQ ID NO: 252) 

111111122 
345678.901 
PCIALFKRY 
CCT . TGC . ATC . GCT . TTG . TTC . AAA . CGC . TAT 
(DNA: SEQ ID NO: 116) 
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10 



20 



Table 209: DNA sequences and predicted amino acid 
sequences around the PI region of BPTI analogues selected 
for binding to Cathepsin G. 

Clone PI 

15 16 17 18 19 

BPTKSEQ ID NO:253) AAA . GCG . CGC . ATC . ATC 
(SEQ ID NO: 254) LYS ALA ARG ILE ILE 



EpiC 1 (a) ATG . GGT . TTC . TCC . AAA SEQ ID NO: 117 

(SEQ ID NO: 2 55) MET GLY PHE SER LYS 

15 EpiC 7 ATG . GCT . TTG . TTC . AAA SEQ ID NO: 118 

(SEQ ID NO: 2 56) MET ALA LEU PHE LYS 

EpiC 8 (b) TTC . GCT . ATC . ACC . CCA SEQ ID NO: 119 

(SEQ ID NO: 2 57) PHE ALA ILE THR PRO 



EpiC 10 ATG . GCT . TTG . TTC . CAA SEQ ID NO: 120 

(SEQ ID NO: 2 58) MET ALA LEU PHE GLN 



EpiC 20 ATG . GCT . ATC . TCC - CCA SEQ ID NO: 121 

25 (SEQ ID NO: 259) MET ALA ILE SER PRO 



(a) Clones 11 and 31 also had the identical sequence. 

(b) Clone 8 also contained the mutation Tyr 10 to ASN. 



Table 210 

Derivatives of EpiNE7 (SEQ ID NO : 48 ) Obtained 

by Variegation at positions 34, 36, 39, 40 and 41 

EpiNE7 (SEQ ID NO: 48) 

♦♦♦♦ **** 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFVYGGCmgngNNFKSAEDCMRTCGGA 

1 2 3 4 5 

12 34 5678 901234 5678 9012 34 567 8 901234 5678901234 5678 9012 34 5678 

**** ♦ ♦ 

EPiNE7.6 (SEQ ID NO: 59) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFlYgGCkgkGNNFKSAEDCMRTCGGA 

EpiNE7.8, EpiNE7.9, and EpiNE7.31 (SEQ ID NO: 60) 
RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFeYgGCwakGNNFKSAEDCMRTCGGA 

EpiNE7.11 (SEQ ID NO: 61) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFgYaGCrakGNNFKSAEDCMRTCGGA 
EpiNE7.7 (SEQ ID NO: 62) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFeYgGChaeGNNFKSAEDCMRTCGGA 
EpiNE7.4 and EpiNE7 . 14 (SEQ ID NO: 63) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFlYgGCwaqGNNFKSAEDCMRTCGGA 
EpiNE7.5 (SEQ ID NO: 64) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFrYgGClaeGNNFKSAEDCMRTCGGA 
EpiNE7.10 and EpiNE7.2 0 (SEQ ID NO: 65) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFdYgGChadGNNFKSAEDCMRTCGGA 
EpiNE7.1 (SEQ ID NO: 66) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFkYgGClahGNNFKSAEDCMRTCGGA 
EpiNE7.16 (SEQ ID NO: 67) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFtYgGCwanGNNFKSAEDCMRTCGGA 
EpiNE7.19 (SEQ ID NO: 68) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFnYgGCegkGNNFKSAEDCMRTCGGA 
EpiNE7.12 (SEQ ID NO: 69) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFqYgGCegyGNNFKSAEDCMRTCGGA 
EpiNE7.17 (SEQ ID NO: 70) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFqYgGClgeGNNFKSAEDCMRTCGGA 
EpiNE7.21 (SEQ ID NO: 71) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFhYgGCwgqGNNFKSAEDCMRTCGGA 
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Table 210: Derivatives of EpiNE7 (SEQ ID NO: 48) Obtained 
by Variegation at positions 34, 36, 39, 40 and 41 
(continued) 

♦♦♦♦♦ **** 
EpiNE7 (SEQ ID NO: 48) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFVYGGCmgngNNFKSAEDCMRTCGGA 

1 2 3 4 5 

1234 5678 9012 34567 8 9012 34 5678 9012 34 5678 9012 34 5678 9012345678 

mil ♦ ♦ 

EpiNE7.22 (SEQ ID NO: 72) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFhYgGCwgeGNNFKSAEDCMRTCGGA 
EpiNE7.23 (SEQ ID NO: 73) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFkYgGCwgkGNNFKSAEDCMRTCGGA 
EpiNE7.24 (SEQ ID NO: 74) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFkYgGChgnGNNFKSAEDCMRTCGGA 
EpiNE7.2 5 (SEQ ID NO: 75) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFpYgGCwakGNNFKlAEDCMRTCGGA 
EpiNE7.26 (SEQ ID NO: 76) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFkYgGCwghGNNFKSAEDCMRTCGGA 
EpiNE7.27 (SEQ ID NO: 77) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFnYgGCwgkGNNFKSAEDCMRTCGGA 
EpiNE7.28 (SEQ ID NO: 78) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFtYgGClghGNNFKSAEDCMRTCGGA 
EpiNE7.2 9 (SEQ ID NO: 79) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFtYgGClgyGNNFKSAEDCMRTCGGA 

EpiNE7.30 # EpiNE7.34, and EpiNE7.35 (SEQ ID NO: 80) 
RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFkYgGCwaeGNNFKSAEDCMRTCGGA 

EpiNE7.32 (SEQ ID NO: 81) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFgYgGCwgeGNNFKSAEDCMRTCGGA 
EpiNE7.33 (SEQ ID NO: 82) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFeYgGCwanGNNFKSAEDCMRTCGGA 
EpiNE7.36 (SEQ ID NO: 83) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFvYgGChgdGNNFKSAEDCMRTCGGA 
EpiNE7.3 7 (SEQ ID NO: 84) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFmYgGCqgkGNNFKSAEDCMRTCGGA 
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Table 210 (continued) 

Derivatives of EpiNE7 (SEQ ID NO: 48) Obtained 
by Variegation at positions 34, 36, 39, 4 0 and 41 

5 EpiNE7.38 (SEQ ID NO: 85) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFyYgGCwakGNNFKSAEDCMRTCGGA 

EpiNE7 (SEQ ID NO: 48) 

+++++ **** 

10 RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFVYGGCmgngNNFKSAEDCMRTCGGA 
1 2 3 4 5 

12345678 901234 567 89012 34 5678 901234 5678 9012 34 5678 9012 34 5678 

15 iUU ♦ ♦ 

EpiNE7.39 (SEQ ID NO: 86) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFmYgGCwgdGNNFKSAEDCMRTCGGA 

EpiNE7.40 (SEQ ID NO: 87) 
2 0 RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFtYgGChgnGNNFKSAEDCMRTCGGA 
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Table 210: Derivatives of EpiNE7 Obtained 
by Variegation at positions 34, 36, 39, 40 and 41 
(continued) 



10 



Notes : 

a) ♦ indicates variegated residue. * indicates imposed 
change. indicates carry over from EpiNE7 . 

b) The sequence M 39 -GNG j_S^ w ^^^^ in EpiNE7 (indicated 
by *) was imposed to Ii^reasesirniTar ity to ITI-D1. 

b) Lower case letters in EpiNE7 . 6 to 7.3 8 indicate changes 

15 from BPTI that were selected in the first round (residues 

15-19) or positions where the PBD was variegated in the 
second round (residues 34, 36, 39, 40, and 41) . 



c) All EpiNE7 derivatives have G 42 . 
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TABLE 211 



Effects of antisera on phage infectifity 



Phage 
(dilution 
of stock) 


Incubation 
Conditions 


pf u/ml 


Relative 
Titer 


MA-ITI 


PRC! 

XT J_D O 




1.2- 10 11 


1 . 00 


(1CT 1 ) 


NRS 




6.8-10 10 


0 . 57 




anti-ITI 




1.1-10 10 


0 . 09 


MA-ITI 


PBS 




7.7- 10 8 


1 . 00 


(1CT 3 ) 


NRS 




6.7-10 8 


0 . 87 




anti-ITI 




8.0- 10 6 


0 .01 


MA 


PBS 




1.3- 10 12 


1 . 00 


(10" 1 ) 


NRS 




1.4- 10 12 


1.10 




anti-ITI 




1.6- 10 12 


1.20 


MA 


PBS 




1.3- 10 10 


1 .00 


(icr 3 ) 


NRS 




1.2- 10 10 


0.92 




anti-ITI 




1.5-10 10 


1.20 
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TABLE 212 

Fractionation of EpiNE-7 and MA-ITI phage on HNE beads 



EpiNE-7 MA-ITI 

Sample Total pfu Fraction Total pfu Fraction 

in sample of input in sample of input 



XJNlrU 1 


-3 
-5 


. -5 * 


i n 9 
x u 


X 


. uu 


J 


.ft 


JL U 


1 

X 


. uu 




Final 

TBS-TWEEN 

Wash 


~> 


. O * 


X U 


X 


. • ± u 


± 


Q . 


x u 


IT 

z> 


. o * 


x u 


pH 7.0 


6 


.2 • 


10 5 


1 


.8-10" 4 


1 


.6- 


10 6 


4 


. 7 ■ 


10" 6 


pH 6.0 


1 


.4 • 


10 6 


4 


. 1- 10" 4 


1 


. 0 • 


10 6 


2 


.9- 


10" 6 


pH 5.5 


9 


.4 • 


10 5 


2 


.8-10" 4 


1 


. 6 • 


10 6 


4 


. 7 • 


10" 6 


pH 5 . 0 


9 


. 5 • 


10 5 


2 


.9-l(T 4 


3 


. 1 • 


10 5 


9 


. 1 • 


lO" 7 


pH 4 .5 


1 


.2 • 


10 6 


3 


.5-10" 4 


1 


.2 • 


10 5 


3 


. 5 • 


lO" 7 


pH 4 . 0 


1 


. 6 • 


10 6 


4 


.8-10' 4 


7 


.2 • 


10 4 


2 


. 1 • 


lO' 7 


pH 3.5 


9 


. 5 • 


10 5 


2 


.9-10' 4 


4 


. 9 • 


10 4 


1 


. 4 • 


lO' 7 


pH 3.0 


6 


. 6 • 


10 5 


2 


.0-10" 4 


2 


. 9 • 


10 4 


8 


. 5 • 


10" 8 


pH 2.5 


1 


.6- 


10 5 


4 


. 8 • 1CT 5 


1 


.4 • 


10 4 


4 


. 1 • 


10" 8 


pH 2 . 0 


3 


. 0 • 


10 5 


9 


. 1 • 10" 5 


1 


. 7 • 


10 4 


5 


.0- 


10" 8 


SUM* 


6 


.4 - 


10 6 


3 


•lO" 3 


5 


. 7 ■ 


10 s 




2 * 


lO" 5 



5 

SUM is the total pfu (or fraction of input) obtained from 
all pH elution fractions 
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TABLE 213 

Fractionation of EpiC-10 and MA-ITI phage on Cat-G beads 

5 





Epic-10 




MA- ITT 




Sample 


Total pfu 
in sample 


Fraction 
of input 


Total pfu 
in sample 


Fraction 
of input 


INPUT 


5.0. 10 11 


1 . 00 


4.6- 10 11 


1 . 00 


Final 

TBS-TWEEN 

Wash 


1.8* 10 7 


3.6- 10" 5 


7.1- 10 6 


1.5-10" 5 


pH 7.0 


1.5-10 7 


3.0- 10~ 5 


6.1-10 6 


1.3-10" 5 


pH 6.0 


2.3- 10 7 


4.6-10" 5 


2.3* 10 6 


5.0-10 6 


pH 5.5 


2.5- 10 7 


5.0- 10" 5 


1.2- 10 6 


2.6- 10" 6 


pH 5.0 


2.1- 10 7 


4.2- 10" 5 


1. 1-10 6 


2.4- 10" 6 


pH 4.5 


1.1- 10 7 


2.2- 10" 5 


6.7- 10 5 


1.5-10" 6 


pH 4.0 


1 . 9- 10 6 


3.8- 10" 6 


4.4* 10 5 


9.6- 10" 7 


pH 3.5 


1.1- 10 6 


2.2- 10" 6 


4.4- 10 s 


9.6- 10" 7 


pH 3.0 


4.8- 10 5 


9.6* 10~ 7 


3.6- 10 5 


7.8- 10" 7 


pH 2 . 5 


2.0- 10 5 


4.0- 10~ 7 


2.7- 10 5 


5.9* 10" 7 


pH 2.0 


2.4- 10 5 


4.8- 10" 7 


3.2- 10 5 


7.0- 10" 7 


SUM* 


9 .9 - 10 7 


2-10" 4 


1.4- 10 7 


3 - 10" 5 



*SUM is the total pfu (or fraction of input) obtained from 
all pH elution fractions 
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TABLE 214 

Abbreviated fractionation of display phage on HNE beads 



DISPLAY PHAGE 





EPiNE-7 


MA-ITI 2 


MA-ITI-E7 1 


MA-ITI-E7 2 


INPUT 


1.00 


1.00 


1.00 


1.00 


(pfu) 


(1.8-10 9 ) 


(1.2-10 10 ) 


(3.3- 10 9 ) 


(1.1-10 9 ) 


WASH 


6 -10" 5 


1 - 10" 5 


2 • 10" 5 


2 • 10" 5 


pH 7.0 


3-10" 4 


1 - 10" 5 


2 • 10" 5 


5 

4 • 10" 


pH 3.5 


3 - 10" 3 


3 -10" 6 


8 • 10" 5 


8 • 10" 5 


pH 2.0 


1 ■ 10" 3 


1 - 10" 6 


6- 10" 6 


2 • 10" 5 


SUM* 


4.3-10" 3 


1.4- 10" 5 


4 

1.1- 10" 


1.4-10" 4 



SUM is the total fraction of input pfu obtained from all pH 
elution fractions 
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TABLE 215 

Fractionation of EpiNE-7 and MA-ITI-E7 phage on HNE beads 



EpINE-7 MA-ITI-E7 



Sample Total pfu Fraction Total pfu Fraction 
in sample of input in sample of input 



INPUT 


1.8-10 9 


1 . 00 


3.0- 10 9 




1. 00 


P H 7.0 


5.2- 10 5 


2.9- 10" 4 


6.4- 10 4 


5 


2.1- 10" 


pH 6.0 


6.4- 10 s 


3.6-10" 4 


4.5-10 4 


5 


1.5- 10" 


pH 5.5 


7.8-10 5 


4 .3-10" 4 


5.0- 10 4 


5 


1.7- 10' 


pH 5.0 


8 .4 -10 5 


4.7- 10" 4 


5.2-10 4 


5 


1.7-10" 


pH 4.5 


1.1-10 6 


6 . 1- 10" 4 


4.4- 10 4 


5 


1.5- 10" 


pH 4 . 0 


1 . 7 • 10 6 


9.4- 10" 4 


2 . 6 • 10 4 


6 


8.7- 10" 


P H 3.5 


1.1-10 6 


6.1-10" 4 


1.3* 10 4 


6 


4.3- 10" 


pH 3 . 0 


3.8- 10 5 


2.1- 10" 4 


5.6- 10 3 


6 


1.9-10" 


pH 2 . 5 


2.8- 10 5 


1.6- 10" 4 


4 . 9- 10 3 


6 


1.6- 10" 


pH 2 . 0 


2 . 9 - 10 5 


1.6- 10" 4 


2.2- 10 3 


7 


7.3 • 10" 


SUM* 


7.6- 10 6 


4 . 1 * 10" 3 


3.1- 10 5 


4 


1.1-10" 



SUM is the total pfu (or fraction of input) obtained from 
all pH elution fractions 
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